Journal of Applied Psychology 


Edited by Donald G. Paterson, University of Minnesota 


Consulting Editors 


George K. Bennett, Psychological Corporation 
Harold E. Burtt, Ohio State University 
Allen L. Edwards, University of Washington 
Clifford E. Jurgensen, Minneapolis Gas Co. 


James P. Porter, Claverack, New York 

Harold F. Rothe, Fairbanks, Morse and Co., 
Beloit, Wis. 

Julian B. Rotter, Ohio State University 

Edward K. Strong, Jr., Stanford University 


Irving Lorge, T. C. Columbia University 
Quinn McNemar, Stanford University 
Alexander Mintz, City College of New York 


Donald E. Super, T. C. Columbia University 
Morris S. Viteles, University of Pennsyloania 
Alfred C. Welch, Knox-Reeves, Minneapolis 





Table of Contents 


Social Status of Industries: A. H. Brayfield, C. E. Kennedy, Jr., and W. E. Kendall 
Manager-Employee “Understanding” in the Retail Grocery and Meat Market: P. V. Marchetti.. 216 
An Experimental Evaluation of the Sensitivity of the Empathy Test: A. 1. Siegel 

The Validation of an “Indecision” Score for Prediction of Proficiency of Foremen: J. P. Guilford. 224 


An Approach to Isolating Dimensions of Fob Success: L. L. McQuitty, C. Wrigley, and E. L. —- 


The Analysis of an Experimental Fob Evaluation System as Applied to Enlisted Naval Fobs: 
E. J. McCormick and W. E. North 233 


Comparability of Personal Attitude Scale Administration with Mail Administration With and 
Without Incentive: P. W. Maloney 


An Empirical Analysis of the Effectiveness of Psychological Warfare: T. G. Andrews, D. D. Smith 
and L. A. Kahn 


Predicting Achievement in Medical School: A Comparison of Preclinical and Clinical Criteria: 
R. Glaser and O. Jacobs 


Subscore Patterns on ACE Psychological Examination Related to Educational and Occupa- 
tional Differences: F. J. Di Vesta. 


The Effect of Methods of Presentation and Examining Conditions on Student Achievement in a 
Correspondence Course: F. J. Di Vesta 


The Use of Levels of Confidence in Item Analysis: V. Appel and D. Kipnis 


Some Sy Short-Cuts in the Development or Analysis of Tests: A. G. MacLean and 
A. T. Tait 


Some Relationships Between the MMPI and a Problem Checklist: R. F. Lockman 
Facilitating Legislative Research: H. A. Grace 


A Comparison of Two Methods of Measuring the Attention-Drawing Power of Magazine Ad- 
vertisements: J. Tiffin and D. M. Winick 

Applied Psychology in Action: 
Legal Status of Advertising and Marketing Psychology Experts 
Reporting Employment Test Scores to Supervisors: C. E. Jurgensen 

Book Reviews 





American Psychological Association 
Vol. 38, No. 4 


August, 1954 





Journal of Applied Psychology 


Published Bi-monthly by the American Psychological Association, Inc. 
Prince and Lemon Sts., Lancaster, Pa. 


Annual subscription, $7.00; single copies, $1.50 


Subscriptions and business communications should be sent to 
American Psychological Association 
1333 Sixteenth Street N.W. 
Washington 6, D. C. 


Articles for publication should be sent to the Editor-elect 


Dr. John G. Darley, Graduate School, University of Minnesota, 
Minneapolis 14, Minnesota. 


Authors should submit an original and one carbon (with one 
copy only of photographs or line drawings) and should retain in 


their possession a carbon copy. 





This Journal gives prompt consideration to 
manuscripts reporting original investigations in 
any field of applied psychology except clinical 
and consulting psychology. A descriptive or 
theoretical article is occasionally accepted if it 
deals in a distinctive manner with a problem of 
applied psychology. The policy is, however, to 
favor papers dealing with quantitative investi- 
gations of direct value to psychologists working 
in the following fields: Vocational diagnosis and 
occupational guidance; educational diagnosis, 
prediction and guidance at the secondary school 
level and higher; personnel selection, training, 
placement, transfer and promotion in business, 
industry and government service including the 
armed forces; supervisory training in business, 
industry and government; bio-mechanics or de- 
sign of machines to fit the human operator; il- 
lumination, ventilation and fatigue in industry; 
job analysis, description, classification and eval- 
uation; measurement of morale of executives, 
supervisors, or employees; surveys of opinion on 
social or political issues, such as those conducted 
by The Psychological Corporation ; psychological 
problems in market research and in advertising. 


Articles may be under 500 words. The maxi- 
mum is 12,000 words, the average in the 


neighborhood of 4,000 words. To reduce lag of 
publication, adherence to the rule of “brevity 
consistent with clarity” is encouraged. 


A lapse of six to twelve months occurs between 
acceptance of an article and its publication, the 
lag varying with the rate at which manuscripts 
are submitted. If, however, an author is pre- 
pared to defray the costs of printing the neces- 
sary extra pages, he may arrange for earlier 
publication without thereby postponing the ap- 
pearance of manuscripts by other contributors. 
This enables the management to provide space in 
addition to the scheduled 64 pages per issue. 
“Early publication” is thus a direct contribution 
to the subscribers. By cutting down lag in pub- 
lication, it also benefits those authors whose 
articles are published in regular turn. 


Tables, footnotes and references as well as 
text of manuscripts should be typed double-spaced 
throughout. Authors should adheie to the con- 
ventions described in the “Publication Manual 
of the American Psychological Association,” 
Psychol. Bull., 1952, 49, No. 4, Part 2. A copy 
of the Manual will be loaned to any prospective 
contributor who does not find it in his library. 


Ratered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879 


Acceptance for mailing at the spe 2 eee ees & Ree OO. Section 34.40, 
P. L. & R. of 1948, authorized October 1 


Copyright, 1954, by the American Psychological Association, Inc, 





Journal of Applied Psychology 








VoL. 38, No. 4 


AucustT, 1954 








Social Status of Industries 


Arthur H. Brayfield 
Carroll E. Kennedy, Jr. 


Kansas State College 


and 


William E. Kendall . 
Chesapeake and Ohio Railway, Cleveland, Ohio 


In 1925, Counts demonstrated that occupa- 
tions may be arranged in order of social pres- 
tige (3). Greatest prestige is usually associ- 
ated with the professional and “higher” busi- 
ness occupations. Skilled trades, technical, 
and distributive occupations occupy an inter- 
mediate position followed by the semiskilled 
and unskilled occupations ranked at the bot- 
tom of the hierarchy. Research on the social 
status of occupations has continued and the 
hierarchical arrangement has been well estab- 
lished. The Counts study was repeated with 
minor variations 21 years later by Deeg and 
Paterson who found almost no change in the 
social status rankings during the intervening 
years (4). 

Does a social status hierarchy exist among 
industries? It occurred to the writers that an 
investigation of this question might be inter- 
esting and significant. A review of the litera- 
ture revealed no such studies. In this paper 
we report an exploratory attempt to ascer- 
tain: (1) whether or not an industrial hier- 
archy exists; and (2) the possible influence 
of occupational status stereotypes upon the 
identification of such a hierarchy. 


Method 


The base method for this investigation was 
a ranking procedure similar to that of the 
studies of occupational prestige hierarchies 
and closely patterned after Baudler and 
Paterson (1). An alphabetical list of 29 in- 
dustries was presented to 68 men and 52 
women members of the same class in Gen- 


eral Psychology with instructions to “rank 
according to what you think their social 
standing is in your community or state.” At 
least one industry from each of the 9 major 
divisions in the Standard Industrial Classifi- 
cation Manual (5) was included. Competi- 
tive industries were included in a few in- 
stances as, for example, bus companies, air 
transport, railroads, and trucking companies. 
The respondents were predcminantly college 
freshmen and sophomores representing 26 dif- 
ferent curriculums. The median rank and its 
quartile deviation were computed for each in- 
dustry and the industries were then placed in 
rank order according to their median values. 
The rank order correlation (rho) between 
men and women rankings was computed. 

A subsidiary problem was to attempt to dis- 
cover whether or not respondents were influ- 
enced by the social status of a particular oc- 
cupational level stereotype which might be 
associated with any given industry. The 
method employed was to vary the instruc- 
tions to four additional groups of respondents 
who ranked the same list of industries. A 
total of 48 men and 76 women from classes 
in General, Educational, and Social Psychol- 
ogy responded to instructions to rank the 29 
industries “according to what you think the 
social standing of an executive in each of the 
industries is in your community or state.” 

An additional 48 men and 66 women from 
General and Educational Psychology ranked 
the industries under instructions to “rank ac- 
cording to what you think the social stand- 
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Table 1 


Rank Order of 29 Industries BaSed on Median Social Status Rankings by 68 
52 Women College Students * 


Men and 


Men Women 


Median 
Rank 
Order 


Median 
Rank 
Order 


Median 
Ranking 


Median 
Ranking 


Quartile 


Quartile 
Deviation 


Industry Deviation 


Medical services 
Banks 
Education 


2.1 

2.7 

4.8 

5.4 

8.5 
10.8 
11.5 
12.5 
12.5 
13.2 
13.2 
13.5 
14.0 
14.0 
14.3 
14.5 
15.3 
15.8 
16.3 
16.7 
18.5 
18.7 
19.3 
21.0 
21.5 
22.0 
23.7 
27.0 
27.0 


1.9 
1.3 
2.7 
3.9 
8.7 
6.7 
4.7 
6.3 
5.5 
3.8 
4.8 
5.7 
4.8 
5.0 
4.9 
5.4 
4.9 
6.2 
7.9 
6.8 
4.6 
4.2 
3.6 
4.9 
6.8 
4.8 
3.9 
2.2 
2.7 


2.1 1.5 

2.6 1.5 

3.6 1.4 

4.0 2.8 

re 64 
8.5 5.3 
14.8 5.3 
9.5 3.7 
10.8 49 
13.8 5.3 
13.8 3.9 
17.5 5.6 
14.2 6.6 
14.3 3.1 
14.5 6.6 
18.5 5.1 
16.5 3.9 
12.5 5.0 
13.5 5.8 
16.5 5.9 
16.5 5.1 
20.4 4.1 
19.8 3.8 
21.2 5.0 
22.0 5.8 
19.5 4.9 
26.0 2.4 
27.4 2.0 
28.2 1.6 


Federal government 

Farming 

Local government 

Aircraft manufacturing 
Broadcasting companies 

Real estate companies 

Air transport companies 

Electric light companies 

Automobile manufacturing companies 
General building construction 
Telephone companies 

Chemical manufacturing companies 
Machinery manufacturing companies 
Food manufacturing companies 
Publishing companies 


17 
18 
Motion picture companies 
Railroads 20 
21 
22 
23 
24 
25 
26 
27 
28 
29 


Retail drug companies 

Furniture manufacturing companies 
Wholesale drug companies 

Hotels 

Oil drilling companies 

Bus companies 

Trucking companies 

Laundries 

Coal mining companies 


* Median rankings and quartile deviations reported to one decimal place only although median rank orders 
and computation of rho’s were based on medians carried to two decimal places. 


ing of a laborer in each of the industries is in clustering around the median value of 14.5. 


your community or state.” 

The results for the latter four groups were 
treated statistically as for the two base 
method groups and intercorrelations among 
the three methods were computed by sex. 


Results 


The results of the rankings by the base 
method groups are shown in Table 1. The 
median rankings of industries distribute them- 
selves over a wide range (from 2 to 27) 
whereas chance responses would yield a 


It is evident from inspection of the quartile 
deviations that there is much greater agree- 
ment on the industries ranked extremely high 
and low than on those ranked in the middle 
of the distribution. 

The correlational results by sex for the 
three ranking methods are summarized in 
Table 2. The correlations are all significant 
beyond the 1% level. The influence of oc- 
cupational stereotype is small since the cor- 
relations are of substantial magnitude. 

Men and women agreed markedly in their 
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status rankings irrespective of method. For 
the base method the rho was .90, for ‘““Execu- 
tive,” .90, and for “Laborer,” .93. 

The existence of an industrial status hier- 
archy seems to be well established by the re- 
sults from the administration of the three 
lists. The assignment of ranks is obviously 
not a chance phenomenon and is relatively 
uninfluenced by sex. 


Table 2 
Intercorrelations (rho) between Three Methods of 
Ranking 29 Industries on Social: 
Status, by Sex 

Note: In upper right-hand half of the table the inter 
correlations for the rankings by men are given and in 
the lower left-hand half of the table the intercorrela- 
tions for the rankings by women are given. 


Method 


Base ‘Executive’ ‘Laborer’ 


Base 89 89 
“Executive” 78 - 81 
“Laborer” 92 4 - 


The determinants of such a prestige hier- 
archy are obscure. For example, the high 
rank accorded to farming by all groups in 
this study may reflect a geographical factor. 
We attempted to ascertain the influence of a 
possible occupational level stereotype upon 
the rankings but found little influence within 
the limitations of the method used. Since the 
operation of at least a white collar-blue collar 
stereotype seems probable from an inspection 
of the rankings a more intensive study of this 
factor might well be undertaken. 

There were interesting differences within a 
broad industrial classification. For example, 
furniture manufacturing did not rank higher 
than 20th on any of the lists while aircraft 
manufacturing ranked below 8th on only one 
of the six rankings. In the field of trans- 
portation, bus companies and trucking com- 
panies consistently ranked well toward the 
bottom while other forms of transportation 
enjoyed a considerably higher status. On the 
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other hand, there was no reliable differentia- 
tion between electric light and telephone com- 
panies selected as representative of utilities. 

The findings of this study should be of in- 
terest to several groups. A few industries 
have demonstrated their concern for public 
opinion by conducting confidential surveys of 
the public’s attitude toward them. The so- 
called institutional advertising campaigns are 
further evidence of this concern. Personnel 
workers are aware of the influence of public 
opinion on their recruiting programs (2, p. 
88). Further, the prestige associated with an 
industry may be a factor in job satisfaction. 

Vocational counselors should be alert to the 
possible influence of the industrial status hier- 
archy on the vocational plans of their coun- 
selees. The methodology employed is poten- 
tially useful to placement officers in schools, 
colleges, and public and private employment 
offices. 


Summary 


The existence of a prestige hierarchy among 
industries was established through the use of 
a ranking method employed with college un- 
dergraduates representative of a variety of 
curriculums. The influence of occupational 
level stereotypes was studied and found to be 
negligible for the populations studied and the 
method used. 


Received August 27, 1953. 
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Manager-Employee “Understanding” in the Retail Grocery 
and Meat Market 


Pietro V. Marchetti 


University of Illinois 


We have chosen, somewhat arbitrarily, the 
term “understanding” as a label for the trait 
or ability of interest to us in this study. This 
term has been used previously by others as 
we are using it. Still other investigators have 
used the words empathy, social psychological 
empathy, and social perception. The ability 
we are interested in is that of being able to 
place one’s self in the position of another. 
We have taken as an indicant of it simply 
the accuracy with which one person is able to 
predict the responses that another will make 
to some given stimulus situation. The now 
very popular technique, and the one we have 
employed, is to have one person predict the 
responses of another to some paper-and-pencil 
device. Rating scales and personality ques- 
tionnaires are instruments that may be so 
employed. 


In much common sense speculation about 
determiners of effective interpersonal relations 
we find this ability to take another’s role (or 
to take another’s point of view) suggested as 


an important one. Speculations of the social 
scientist, too, suggest such an ability as an 
important one in interpersonal relations. The 
kind of interpersonal relation of interest to us 
in the present study is that between the for- 
mal face-to-face leader and his followers (or 
subordinates) in the work (or job) situation. 
Stogdill in his 1948 survey of leadership 
studies in which some attempt had been made 
to determine the traits or characteristics of 
leaders, concluded that this approach to the 
study of leadership was an inadequate one. 
“ Leadership] appears rather to be a work- 
ing relationship among members of a group, 
in which the leader acquires status through 
active participation and demonstration of 
his capacity for carrying cooperative tasks 
through to completion. Significant aspects of 
this capacity appear to be intelligence, alert- 
ness to the needs and motives of others, and 
insight into situations, .. .” (16, p. 66). 


Gibb has prepared a survey of those leader- 
ship studies emphasizing the interactional re- 
lationship between the leader’s traits and the 
characteristics of the particular situation in 
which he functions. Gibb writes, “The func- 
tion of the leader is to embody and give ex- 
pression to the needs and wishes of the group 
and to contribute positively to the satisfac- 
tion of these needs” (6, p. 20 f.). Roethlis- 
berger (13) and Barnard (1), among many 
others who have written in the area of indus- 
trial leadership, point to such an ability as 
an important one in effective leadership. We 
might note parenthetically that Barnard, a 
professional industrial manager, wrote in 
1940, “Leadership appears to be a function 
of at least three complex variables—the in- 
dividual, the group followers, the conditions” 
(1, p. 16). From his observations in the in- 
dustrial enterprise he arrived at a statement 
about leadership quite in accord with the psy- 
chologist’s interactional theories of leadership 
which have replaced earlier trait theories. 

One psychological study we would note 
briefly, which served as a major impetus to 
our own work, is that of Chowdhry (4). She 
has suggested situational-traits—traits com- 
mon to leadership and yet a function of the 
situation. She found, in general, that the 
sociometric leader in the groups she studied 
(primarily college student groups) could 
make more accurate judgments or estimates 
of group opinion than could the non-leaders, 
defined sociometrically. 

The studies of Meyer (12) and of Cantor 
(3) are two studies, from the pertinent litera- 
ture, very closely related to our own. Meyer 
studied 200 first-line supervisors in a large 
utility company. He asked the supervisors 
to predict the behavior of other persons, who 
had been described briefly for them, in cer- 
tain interpersonal situations. There was evi- 
dence that the better supervisors regarded 
others as individuals with motives, feelings 
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and goals of their own. The poorer leader 
was more likely to perceive others in relation 
to his own motives or goals. Cantor did an 
experimental study of a human relations pro- 
gram in the Farm Bureau Insurance Com- 
panies in Ohio. One of his findings was that 
following the supervisory training conferences 
the supervisors showed gains in scores on a 
test of the ability to estimate group opinion. 
Lastly, in this sampling of earlier studies 
which relate to the suggested factor of the 
leader's understanding of the followers there 
are two University of Michigan Survey Re- 
search Center studies of productivity, super- 
vision and morale. One of these was carried 
out in a large life insurance firm (7) and the 
other with gangs of men who maintain sec- 
tions of railroad right of way (8). In each 
of these studies there was evidence that em- 
ployee groups of higher productivity were un- 
der supervisors or foremen who were more 
employee centered. There was evidence of 
their taking more interest in their employees 
than did the leaders of lower productivity 
units. There was some evidence that they 
considered the possible needs and motives of 
their employees in their interpretations of the 
behavior of employees. This last finding may 
be compared with the results of a study of 
Mass (11). He reports that the leaders of 
youth groups sponsored by community agen- 
cies, after taking courses intended to make 
for more effective leadership of youths, made 
more of what he calls causal reactions rather 
than judgmental reactions to the behavior of 
the youths. He gives as an example of a C- 
reaction, “Joe is smoking a pipe perhaps be- 
cause he is the smallest boy in the group or 
perhaps to rebel against paternal sanctions 
against smoking”; and as a J-reaction, “Joe is 
a bad boy,” or “Joe’s only fault is smoking.” 

Granting as a factor in effective leadership, 
the leader’s ability to understand the group 
members, a number of questions about this 
ability quickly arise. One is suggested by Sol 
Levine’s (9) discussion of leadership. There 
may be a curvilinear relationship between un- 
derstanding and leadership effectiveness. With 
too limited understanding we may have Le- 
vine’s formalistic leader who is not very suc- 
cessful in motivating the group members and 
eliciting from them a genuine contribution of 
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their efforts to the group's tasks. The op- 
posite extreme of understanding may make 
for Levine’s anarchic leader acutely sensitive 
to the feelings of the group members but in- 
capacitated by his lack of ability to abstract, 
to see beyond the concrete. Apart from the 
question of optimal amount of understanding 
there is also the question about the kind of 
understanding. That is to say, there are 
many different things that one might know 
about another person—many different aspects 
of another’s personality that one might un- 
derstand or know about. It is reasonable to 
assume that the various possible kinds of un- 
derstanding that one might have of another 
are not equally important in the leader-fol- 
lower relationship in the job situation. Perti- 
nent to this question is work of Luszki (10) 
on empathic ability and social perception. 
She presents evidence of some independence 
between the ability of one person, A, to pre- 
dict the responses of another, B, to a stimu- 
lus situation not involving A, and A’s ability 
to predict the responses of B to a stimulus 
situation which does involve A. She speaks 
of detached as compared with participant ob- 
server skill. We shall borrow these terms 
and use them analogously as adjectives for 
understanding. A third question we ask is 
that of differences among work situations in 
terms of the degree to which effective leader- 
ship in the situation is determined by or as- 
sociated with the leader’s understanding of 
the group members. Where the group tasks 
are such that individuals function more as 
automatons the leader’s understanding of the 
group members may be of less importance in 
leadership effectiveness, particularly so if we 
take some aspect of group productivity as a 
criterion of leadership effectiveness. There is 
a fourth and final question we would raise at 
this point. This has to do with the relation 
of “apparent” to “real” understanding. The 
latter is the sort of understanding with which 
we are concerned in the present study. This 
is related to the amount and kind of knowl- 
edge that one person has of another which 
makes it possible for him to make an accurate 
prediction of how the other person will re- 
spond to a given situation. The person A 
may have such understanding of B. Bb, how- 
ever, may not have such understanding of C 
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and yet he may appear to. A, with the 
knowledge or understanding that he has of B 
can select, with a minimum of trial and error, 
the appropriate stimulus situation with which 
to confront B in order to elicit a given kind 
of response from B. B on the other hand 
does not have such understanding of C. 
Nevertheless, he is able to elicit, from C, a 
desired response. B’s success in this may be 
primarily a matter of trial and error. He 
may be able to confront C with one stimulus 
situation after another noting very quickly 
any immediate cues that C may be giving 
him, on the basis of which B can predict 
what C’s more complete response to the 
stimulus situation would be. On this basis 
B can decide if it will be necessary to pre- 
sent C with still some other stimulus situa- 
tion in order to elicit the desired response or 
not. Another recognition of this problem is 
to be found in a discussion of leadership by 
Smith (15). 

The task we have set for ourselves is that 
of making a more frontal attack upon the 
problem of leader-follower (and _follower- 


leader) understanding in the job situation. 
We hope to determine in various kinds of job 


situations those variables in the job situation 
which are correlates of the understanding (or 
rather, understandings) between employee 
and immediate supraordinate. We are prin- 
cipally concerned, of course, with the degree 
to which such understandings may correlate 
with job satisfaction, that is, attitudes of 
rank-and-file toward various aspects of the 
job situation; with various criteria of leader- 
ship effectiveness; and finally with group ef- 
fectiveness (or productivity) and group effi- 
ciency. We would expect such understanding 
to be more highly correlated with group effi- 
ciency than with group effectiveness. As we 
have noted earlier, effectiveness or produc- 
tivity in the job situation is today, many 
times, more a function of technological fac- 
tors rather than of the kind of interpersonal 
relationship between worker and immediate 
supraordinate. The obvious difficulty in at- 
tempting to demonstrate the correlation be- 
tween understanding and group efficiency is 
that of developing an adequate indicant of 
efficiency of the group—an indicant that 
would reflect the psychological costs, to the 
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individual worker, of the work accomplished 
by him. We are thinking here of Ryan’s dis- 
cussion of cost of work to the individual in 
his Work and effort (14). We have also in 
mind Barnard’s discussion of the relation of 
efficiency of the group to the individual effi- 
ciencies of its members, in his The functions 
of the executive (2). It is hoped that ulti- 
mately such studies might contribute to more 
effective training as well as selection of per- 
sons to serve in supervisory capacities in 
various work situations. The results of such 
studies may also contribute to more effective 
matching of employee and supervisor. We 
may eventually be able to consider in the 
placement of personnel an additional variable 
and that would be the degree to which the 
employee might be expected to be enigmatic 
to a particular supervisor (or the supervisor 
enigmatic to the employee). The objective, 
of course, would be to so match employee 
and supervisor that there might be adequate 
understanding one of the other. 


Procedure 


Subjects. In the present study the subjects 
were the rank-and-file employees and managers 
of ten grocery retail units and two retail meat 
market units of a midwestern grocery “chain.” 
One of the grocery units is excluded from the 
data analysis. It was the first unit in which we 
collected our data. It soon became apparent in 
this unit that the questionnaires we had selected 
for our study were too lengthy. After making 
modifications of our questionnaires the study was 
continued in the remaining units. The data col- 
lected in the first unit, of course, remain incom- 
parable to those gathered in the other units. The 
number of employees in the eleven units upon 
which this report is based, varies from three to 
42. More specifically, Unit A had 17 employees; 
B, 21; C, 3; D, 4; E, 6; F, 4; G, 8; H, 12; I, 
42; J, 8; and K, 14. The Units J and K are 
meat markets. 

Measures of Understanding. As _ indicated 
earlier, the measures, generally, are statements 
of the accuracy of one person’s predictions of 
the responses of another to a questionnaire. A 
predicts the responses of B to the items of a 
questionnaire. For each item there are several 
response categories, in some order, from 1 to 5. 
If B chooses the response category 4, and A pre- 
dicts that B’s choice will be 2, we shall say that 
A’s error for that item is 2. The direction of 
the error was ignored in the present study. One 
can then determine for A, the mean error score; 
that is, the mean of the errors made on each 
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item. All of our understanding measures are 
just such mean error scores. 

The four measures of understanding were the 
manager’s detached understanding of the em- 
ployee (MDU), the manager’s participant under- 
standing of the employee (MPU), the employee’s 
detached understanding of the manager (EDU), 
and the employee’s participant understanding of 
the manager (EPU). For each employee in each 
unit we determined an MDU, MPU, EDU, and 
EPU. The MDU is the mean number of errors 
made by the manager in his predictions of the 
employee’s responses to the items of the Tear 
Ballot for Industry. The MPU is the mean 
number of errors made by the manager in his 
predictions of the responses of the employee to 
a questionnaire we have labeled Supervisory 
Practices Questionnaire. This is simply a short- 
ened form of an instrument developed by Fleish- 
man (5). The employee’s responses to the items 
of this questionnaire indicate how the employee 
thinks that his manager typically behaves to- 
ward his employees. A sample item is, “He 
criticizes people under him in front of others.” 
The response categories are: 1. Often; 2. Fairly 
often; 3. Occasionally; 4. Once in a while; and 
5. Very seldom. The EDU is the mean number 
of errors made by the employee in his predic- 
tions of the manager’s responses to the Super- 
visory Practices Questionnaire with the items so 
reworded that the manager’s responses indicate 
how he thinks that he, the manager, typically 
behaves toward his employees. The EPU is the 
mean number of errors made by the employee in 
his predictions of his rating by the manager. 
The manager rated each employee on each of 
the following seven characteristics: (1) how the 
employee receives orders and suggestions; (2) 
customer relations; (3) initiative; (4) accept- 
ance by fellow workers; (5) promotability; (6) 
personal appearance; and (7) general effective- 
ness in present job. These characteristics were 
suggested in descriptions of poor and good em- 
ployees which were obtained in interviews with 
two of the managers. 

In each of the eleven units we determined, for 
each of the four understanding measures, a split- 
half (odd-even) reliability to which we applied 
the Spearman-Brown formula to obtain an esti- 
mate of the reliability of a test doubled in length. 
For each of our four measures, then, there were 
eleven estimates of reliability, one obtained in 
each unit. The median reliability estimates for 
the measures MDU, MPU, EDU and EPU are, 
respectively, .78, .82, .79, and .83. 


Results 


Our results are given in the form of rank- 
order coefficients of correlation between each 
of the four measures of understanding and 
(a) the manager’s ratings of the employees; 
(b) the evaluation of the manager by the em- 
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ployees on the Supervisory Pratices Question- 
naire; (c) the job satisfaction of the em- 
ployees as measured by the Tear Ballot; and 
(d) the efficiency of the unit as evaluated 
subjectively by a member of management 
supraordinate to the unit managers. Each 
correlation coefficient is based upon eleven 
cases; the eleven units ranked in terms of 
the mean MDU of the unit, the mean MPU, 
the mean EDU, and the mean EPU. The 
units were, of course, also ranked in terms 
of the mean ratings by the manager of em- 
ployees in the unit; the mean evaluation of 
the manager by the employees in the unit; 
the mean job satisfaction of the employees; 
and finally the units were placed in a rank 
order of efficiency by the managers’ supra- 
ordinate. 

There were no well founded hypotheses 
about the direction of correlation between 
the understanding measures and the ratings 
of employees nor about the direction of cor- 
relation between these measures and the em- 
ployees’ evaluations of the managers. For 
this reason the so-called two-sided test of 
significance of the correlation coefficient is 
considered appropriate. For eleven cases the 
rank-order coefficient must be .60 for signifi- 
cance at the five per cent level of confidence 
and .74 for significance at the one per cent 
level. We did hypothesize positive correla- 
tions between the measures of understanding 
and employee satisfaction as well as between 
these measures and the efficiency ratings of 
the units. These relationships are suggested 
both by common sense speculation and earlier 
empirical studies. To test the significance of 
these correlations we have used the one-sided 
test of significance. For eleven cases the 
rank-order coefficient must be .54 for signifi- 
cance at the five per cent level of confidence 
and .73 for significance at the one per cent 
level. These results are summarized in 
standard type, in Table 1. 

It quickly becomes apparent from the data 
in Table 1 that there is no significant correla- 
tion between any of the understanding meas- 
ures and either the ratings of employees or 
evaluations of managers by employees. Em- 
ployee job satisfaction, on the other hand, 
does seem to have some correlation with the 
manager detached and participant under- 
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Table 1 
Correlations Between Understanding Scores and (1) 
Employee Ratings by the Manager; (2) Evalua 
tion of the Manager by the Employees; (3) 
Job Satisfaction of the Employees; and 
(4) Efficiency of the Retail Unit 
Employee Ratings and 
MDU AO 52 
MPU .O8 16 
EDU 07 43 
EPU —.10 O8 
Job Satisfaction of Employees and 
MDU 56 55 
MPU 53 AY 
EDU 43 18 
EPU 13 12 
MDU/SL 07 13 
MPU/SL 72 74 
Evaluation of Manager and 
MDU 00 — .26 
MPU — .02 6 
EDU 37 
EPU 07 
Efficiency of the Retail Unit and 
MDU 20 
MPU 55 
EDU 5 
EPU 52 
MDU/SL 55 j 
MPU/SL 61 63 


standing of the employees. The correlation 
between MDU and job satisfaction is signifi- 
cant at the five per cent level (using the one- 
sided test) and the MPU and job satisfaction 
correlation very closely approaches signifi- 
cance at the five per cent level. There is but 
the suggestion of correlation between EDU 
and job satisfaction. 

In each unit we were able to identify one 
to three people as the one(s) receiving the 
greater proportion of choices or votes on a 
sociometric questionnaire. The sociometric 
criterion question asked of each employee in 
each unit, was answered by the employee's 
singling out the one of his fellow employees 
whom he would most like to have go with him 
if he were to be transferred to another unit 
in the same company. In each unit we de- 
termined the mean MDU and MPU based 
not upon all of the employees in the unit, as 
we did originally, but now based only upon 
the most frequently chosen persons in the 


unit. We may think then of the man- 
ager’s detached as well as participant under- 
standing of the sociometric leaders in his 
unit—MDU/SL and MPU/SL, respectively. 
MDU/SL has no significant correlation with 
the mean job satisfaction (the mean of all 
employees’ job satisfaction scores, as origi- 
nally determined). The MPU/SL, however, 
does correlate significantly with job satisfac- 
tion, almost at the one per cent level. In 
the present study, then, we find that the 
greater the accuracy of the unit managers in 
predicting how they are evaluated by those 
employees in their respective units, who are 
most frequently chosen on a_ sociometric 
questionnaire, the greater the mean job satis- 
faction of the unit as a whole. 

The efficiency ratings of the units by the 
managers’ supraordinate correlate _ signifi- 
cantly (at the five per cent level) with MPU, 
MDU/SL, and MPU/SL. Their correlation 
with EPU closely approaches significance at 
the five per cent level. 

Turning again to Table 1 and noting the 
italicized coefficients we find that these values 
(with two or three exceptions) are relatively 
of the same order of magnitude as the coeffi- 
cients discussed earlier. The coefficients in 
italics are based upon the nine grocery units. 
The two meat markets are excluded. Obvi- 
ously, with the change in the number of cases 
the values of correlation coefficients for the 
two levels of confidence, which we cited 
above, do not apply here. The major dif- 
ferences between the two sets of coefficients 
are that the correlations of MDU and EDU 
with the employee ratings more closely ap- 
proach statistical significance with the meat 
markets excluded; and the correlation be- 
tween EPU and the efficiency ratings of the 
units becomes appreciably greater. However, 
these differences do not appear to be such 
that they suggest that the grocery and meat 
market units differ significantly in terms of 
the relationships explored in this study—the 
relationships between measures of under- 
standing and employee ratings, manager 
evaluations by employees, employee job 
satisfaction, and ratings of efficiency of the 
units. The sample of meat markets num- 
bering but two made it impracticable to test 
this statistically. It is proposed to explore 
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this matter further with additional grocery 
and meat market units, preferably including 
units from other companies in order that 
there may be some test of the generality of 
the relationships that did emerge—or were 
suggested by the results of the present study. 

Our results appear to us to suggest that 
some further work along the general lines of 
this study is warranted. We should like to 
employ the same as well as some different 
measures of understanding between leader 
and follower in different work situations. It 
would seem profitable to use other than a 
global measure of job satisfaction; that is, 
measures of different aspects of job satisfac- 
tion (or morale) in order to determine how 
different measures of understanding may re- 
late differently to the various factors of job 
satisfaction. We should also like to explore 
the relationships between measures of under- 
standing and productivity as well as effi- 
ciency of the group. We have certain reser- 
vations about the efficiency ratings of the 
present study. We do know that the man- 
agers’ supraordinate who did the ratings felt 
that he should not be too influenced by dif- 
ferences among the units in terms of net 
profits. There are factors determining these 
profits which are beyond the control of the 
personnel of the unit. A principal one is that 
at higher levels of management it may be de- 
cided to price merchandise differently in dif- 
ferent units, in attempts to determine opti- 
mum prices. The rater reported as one basis 
of his judgments, the criticisms, favorable and 
unfavorable, made by customers to the com- 
pany about the service and personnel of the 
different units. The rater also considered the 
suggestions originating with the personnel of 
the different units for the improvement of the 
operation of the units. Another consideration 
was the physical appearance of the store—its 
cleanliness and the effectiveness of the dis- 
plays of merchandise. 

Summary 

Certain earlier studies suggesting the pres- 
ent one have been reviewed very briefly. 
Several measures of understanding between 
manager and employees in the retail grocery 
and meat market have been described. The 
correlations of these measures with the fol- 
lowing variables have been reported: (1) 


manager’s rating of the employees; (2) em- 
ployees’ evaluation of the manager; (3) job 
satisfaction of the employees; and (4) rat- 
ings of efficiency of the units. None of the 
measures of understanding correlated signifi- 
cantly with either the first or second of the 
above variables. Certain ones of the under- 
standing measures did correlate significantly 
with the third and fourth variables. 
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An Experimental Evaluation of the Sensitivity of the 
Empathy Test 


Arthur I. Siegel 
Institute for Research in Human Relations, Philadelphia, Pa. 


The Empathy Test (1) is now of interest 
because of the recently reported high correla- 
tions (2) of this test with merit rankings of 
sales-managers’ rankings of automobile sales- 
men (r = .71) and with actual sales records 
of automobile salesmen (r = .44). The test 
correlated (3) as follows with six criteria of 
success for union business agents: record for 
settling grievances and disputes, r = .64; re- 
cruitment of new members, r = .60; per cent 
vote received in union elections, r = .38; en- 
forcement of rules and regulations, r = .44; 
leadership rank, r = .67; knowledge of su- 
pervisory principles, r = .55. The multiple R 
with these six criteria was .76. 

The authors of The Empathy Test define 
empathy in the following way: “This unique 
talent, well known among ‘natural’ leaders, 
successful sales managers, and outstanding 
counselors, is the ability to ‘put yourself in 
the other person’s position, establish rapport, 
and anticipate his reactions, feelings, and be- 
haviors.’ This ability is known as empathy, 
except that the past accepted definitions of 
empathy seem somewhat inadequate since 
they stress mere identity of feeling and omit 
the practical element of prediction of the 
other’s behavior . . . individuals who are su- 
perior in empathetic ability are persons who 
are above average in understanding and an- 
ticipating reactions of other people” (empha- 
sis ours). 

The Empathy Test consists of three sec- 
tions. In the first section the respondent is 
asked to rank the popularity of 14 musical 
types (polkas, classicals, waltzes, etc.) with 
non-office factory workers of the United 
States. In the second section, the respondent 
ranks the popularity of 15 magazines with 
the average American, and in the third sec- 
tion the respondent ranks the annoyance 
magnitude of 15 experiences (a boisterous 
person attracting attention, hearing a person 
chewing gum, seeing a person’s nose running, 


222 


etc.) to persons aged 25-39. Thus, in all of 
the sections the respondent is asked to reply 
not as he would answer, but as the average 
person would perform the ranking, and from 
these rankings an empathy score is derived. 

Although some low correlations have also 
been reported by Kerr and his co-workers 
(1), in view of the high correlations obtained 
it seemed that some independent experimental 
evaluation of The Empathy Test was war- 
ranted. Assuming the validity of The Empa- 
thy Test and assuming that clinical psycholo- 
gists are higher on empathy than experimen- 
tal psychologists, then clinical psychologists 
should score higher on The Empathy Test 
than experimental psychologists. This as- 
sumption for clinical psychologists does not 
seem to be outside the scope of definiticn of 
empathy as given by the authors of The Em- 
pathy Test, and seems tenable to the present 
author. 


Method 


Form A of The Empathy Test was dis- 
tributed by mail to 50 “fellows” of the Divi- 
sion of Experimental Psychology and 50 
“fellows” of the Division of Clinical and Ab- 
normal Psychology of the American Psy- 
chological Association. The sample was ob- 
tained by taking every fifth “fellow” listed 
in the 1951 A.P.A. Directory in the Division 
of Experimental Psychology and every tenth 
“fellow” in the same directory in the Division 
of Clinical and Abnormal Psychology until a 
total of 50 names in each division were ob- 
tained. In some instances, no clear address 
was listed and in that case the name appear- 
ing directly below the ordered name was used. 
A total of 36 of the forms were returned by 
the “experimentalists.” Of these, only 34 
were completely filled out and one was re- 
ceived after our data were already analyzed. 
Thus, our total N for experimentalists was 
33. A total of 25 out of 26 of the forms re- 
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Table 1 


Means and Sigmas of Clinicians and Experi- 
mentalists on Empathy Test 


Mean 


Sigma 


87.7 14.7 
86.7 18.1 


Clinicians 
Experimentalists 


turned by the “clinicians” were usable (N = 
25). None of the subjects were informed of 
the purpose of the experiment until after all 
of the forms used in the comparison had been 
returned. 


Results 


The Empathy Tests were scored and means 
and standard deviations calculated. These 
data are presented in Table 1. 

The mean Empathy Test score for “ex- 
perimentalists” was 86.7 while the mean Em- 
pathy Test score for “clinicians” was 87.7. 
The difference between the means is not sig- 
nificant. The mean scores obtained would 
place both the “clinicians” and the “experi- 
mentalists” at the 70th percentile on The 
Empathy Test’s norm for college men. 

Since liberal arts female students score 
lower on The Empathy Test than liberal arts 
males, and since 14 female “clinicians” were 
sent the test while only one female “experi- 
mentalist” received the questionnaire, the ob- 
jection may be raised that this sampling dif- 
ferential operated so as to bias the scores of 
the groups in favor of the “experimentalists.” 
However, if this were the case, the variance 
of the clinical group should have been greater 
than the variance of the experimental group. 
The reverse was true. 

All of this might indicate that The Em- 
pathy Test either measures something other 
than empathy, measures empathy plus an- 
other variable, or is not a sensitive instru- 
ment. 
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An alternative explanation has been ad- 
vanced by Kerr, who kindly reviewed an 
early form of the present paper. Kerr points 
out the possibility that the better clinicians, 
possessing a vested interest, may have been 
more defensive about “going out on a limb” 
and thus the better clinicians may not have 
returned the forms. This sampling differ- 
ential may have acted to lower the empathy 
scores of the “clinicians.” The present au- 
thor feels that this. explanation is unwar- 
ranted in view of the fact that neither group 
was informed of the purpose of the research 
until after the forms were returned. If the 
clinicians were unaware of the purpose of the 
research, there was little reason for the bet- 
ter clinicians to believe that they were “go- 
ing out on a limb,” and thus withhold the re- 
turning of their forms. 

In fairness to the authors of The Empathy 
Test, we would like to point out that they 
have never claimed that it will distinguish 
between clinical and experimental psycholo- 
gists. Moreover, the assumption that clinical 
psychologists are higher on empathy than ex- 
perimental psychologists was our assumption. 


Summary 


The Empathy Test was submitted by mail 
to a group of experimental and a group of 
clinical psychologists. Assuming that the 
“clinicians” are higher on empathy than the 
“experimentalists,” The Empathy Test did 
not reflect this difference. 
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The Validation of an “Indecision” Score for Prediction of 
Proficiency of Foremen 
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The results to be reported briefly here are 
essentially negative, but perhaps negative re- 
sults should be reported more often than they 
are. On the one hand such a report may 
save another investigator from entering the 
same blind alley. On the other hand it may 
give another investigator an idea for doing a 
similar study in a modified way which will 
lead to positive results. 

The study is also opportunistic, in the 
sense that it was not planned in advance but 
was possible as a byproduct of another study. 
The writer happened to have at his disposal 
the answer sheets from more than 400 fore- 
men in an eastern industrial plant, these fore- 
men having taken the three personality in- 
ventories, STDCR, GAMIN, and Personnel 
Inventory (of factors O, Ag, and Co).' The 
writer had also been supplied with ratings of 
general proficiency of the same foremen as 
judged by their immediate superiors. Un- 
fortunately, details we should like to have 
concerning the administration of the inven- 
tories and the way in which the ratings were 
obtained are seriously lacking. It can only 
be said that the inventories were adminis- 
tered after the foremen were employed and 
that the ratings were on a five-point scale, 
with efforts made to disperse the frequencies 
toward a normal distribution. There is no in- 
formation concerning reliability or validity of 
the ratings. We can only assume that they 
have some reliability and validity, for they 
were predictable from inventory scores. 

Another study has dealt with the valida- 
tion of the 13 inventory scores against the 
rating criterion.” The interest in the present 
report is directed toward individual differ- 

' The contributor of the data on which this report 
is based wishes his organization to remain anony- 
mous. I am nevertheless grateful to him for making 
the data available. 

*R. R. Mackie. Norms and validities of 16 test 
variables for predicting success of foremen. A Mas- 


ter’s thesis in the University of Southern California 
Library, 1948 
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ences in the tendency of the examinees to 
use the question-mark response to the items. 
It will be remembered that the alternative 
responses to the items are “Yes,” “?,” and 
“No.” Each examinee could be given a score 
according to the number of ‘“?” responses he 
gave. It was hypothesized, subject to cer- 
tain qualifications to be mentioned later, that 
a large portion of the variance of the “?” 
score represents a personal trait of indecision. 
The greater the number of “?” responses an 
individual gives, the greater his degree of in- 
decisiveness. It was also hypothesized that 
indecisiveness is an unfavorable trait for fore- 
men and it was consequently predicted that 
the correlation between this score and the 
criterion would be significantly negative. 

The “?” score comes in the general cate- 
gory of response-set scores that are receiv- 
ing increasing attention as possible objective 
measures of personality traits. The meaning 
of such scores, even when they prove to be 
highly reliable, must always be questioned. 
While the first hypothesis about the meaning 
of the proposed score is that it measures a 
trait of indecisiveness, there can be other hy- 
potheses, which we will consider. 

Indecision can enter into the picture in 
more than one way. Let us assume first that 
the examinee is cooperative and attempts to 
answer each item in the way that most nearly 
describes himself. He is most likely to waver 
between responses “Yes” and “No” under 
two conditions. One is when he does not 
know himself very well with respect to the 
question asked. Some “?” responses, of 
course, represent complete ignorance or in- 
ability to give one of the other responses. 
But when there is partial knowledge, whether 
the examinee will give the “?” response or 
one of the others will depend upon his readi- 
ness to make a more or less arbitrary choice 
versus his inclination not to do so. This is 
the kind of case whose behavior one would 





“Indecision” Score for Prediction of Proficiency of Foremen 


like to measure by means of an “indecision” 
score. 

Another occasion for wavering is when the 
examinee knows himself well but is himself 
near the limen for the item; he is on the 
borderline that to him separates “Yes” and 
“No.” This kind of indecision, too, we would 
like to have included in the measurement, 
since it is probably psychologically identical 
with the first type mentioned. In this con- 
nection we have the problem of equality of 
opportunity for indecision. Presumably, an 
examinee who is near the limen for most of 
the items has much more occasion for waver- 
ing than an examinee who is decisively on one 
side or the other of the trait continuum for 
most items. If a “?” score were based upon 
the items that are keyed for one trait only, it 
can be seen that those who earn moderate in- 
ventory-trait scores have more opportunity 
for wavering than those who earn scores at 
either extreme. The relation between the 
“?”” score and the inventory-trait score would 
be curvilinear. Since each inventory is scored 
for relatively independent factors and the in- 
tercorrelations of scores tend to be small, it 
is very unlikely that an examinee will be at 
moderate positions on all traits. Hence the 
opportunities for wavering are somewhat 
equalized if we obtain a “?” score from all 
the items in the inventory combined. Some 
index of opportunity might well be taken into 
consideration, however, if we want variations 
in “?” scores to represent traits such as in- 
decision, freed from involvement with pat- 
terns of factor scores. 

A third occasion for wavering and inde- 
cision occurs among those examinees who 
may have decided to answer the items not as 
they are but as they think will make a good 
impression. Here the wavering is with re- 
spect to which is the more favorable re- 
sponse, “Yes” or “No.” It is likely that 
without knowledge of the key and with lack 
of experience in taking inventories, many in- 
stances of liminal alternatives arise. Again 
the kind of indecisiveness in which we are in- 
terested would have room for play. The “?” 
responses given under this condition should 
also indicate the trait we want to measure. 

Three possible meanings of the ‘“?” re- 
sponse have been discussed. All of them, it 
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has been argued, are potentially contributory 
to the indecisiveness variance we want to 
emphasize in the score. Other meanings that 
do not contribute to this variance include 
cases in which the examinee does not know 
the answer to the question and he should 
therefore legitimately respond with the “?,” 
and cases in which he is at or near the limen 
for the item and a “?” response represents a 
correct position for him between “Yes” and 
“No” on the trait continuum. But, as was 
pointed out above, there are individual dif- 
ferences in tolerance of an indecisive re- 
sponse, and this fact makes the “?” response 
contribute to the variance we want. On the 
other hand, there is a possibility that the sig- 
nificant difference here is in the form of will- 
ingness to guess or to gamble versus a caution 
in this regard. This is not logically an as- 
pect of the indecisiveness variable with which 
we are concerned. 

If the lack-of-self-knowledge component of 
the “?” score is appreciable, it would add 
to reliability and also probably to validity 
against the foreman criterion. The greater 
the lack of knowledge of self, the poorer 
should be the chances of success as a foreman 
or of leaders of other kinds. The willingness- 
to-guess component should add to reliability 
but its effect on the “?” score (which is to 
reduce that score) would tend to detract from 
validity against the foreman criterion, assum- 
ing that good foremen are inclined to be cau- 
tious in a situation like this. 


Results 


Each of the three inventories was adminis- 
tered as a unit and was given an indecision 
score as a unit rather than factor by factor. 
This was partly to assure a larger range of 
scores and partly to equalize opportunity for 
wavering, as suggested above. It was of in- 
terest, first, to determine whether individual 
differences in indecision scores are consistent 
from one inventory to another. The inter- 
correlations of scores from the three inven- 
tories provide estimates of alternate-form re- 
liability. 

The frequency distributions of the three 
indecision scores all approached the Poisson 
type, with modes at a score of zero. The 
proportions of zero scores were .41, .48, and 
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55. This form of distribution was obtained 
under the pressure of the instruction for the 
examinee to avoid the “?” response. It was 
assumed that the underlying trait continuum, 
however, was one on which the distribution 
in the population is normal. Tetrachoric cor- 
relations were therefore computed. They 
were .73, .75, and .88, with an average cor- 
relation (Fisher-Z method) of .80. Since 
these intercorrelations were fairly high the 
three indecision scores were summed to yield 
one score for each examinee. The reliability 
of such a score should be in the region of .90. 

The correlation of this combined indecision 
score with the rating criterion was also found 
by means of the tetrachoric r. The sample 
of 405 foremen was divided into two groups, 
one having to do with tools and maintenance 
and the other with production. The scatter 
plots show no signs of non-linearity. The 
validity coefficients were + .14 and — .09 
for the two groups, respectively. With Ns 
of 119 and 286, these coefficients are statisti- 
cally insignificant. They also differ in sign. 


We may therefore accept the idea that they 
are random deviations from zero correlation 
and conclude that there is no support what- 


ever for the original hypothesis. 

While there is no evidence of validity of 
the indecision scores in connection with the 
performance of these foremen, the level of re- 
liability of the scores is promising of a type 
of personality measurement that has much 
stability and may be well worth further 
study. To be tiseful for practical purposes, 
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however, something would need to be done 
to improve discrimination at the lowest levels 
where discrimination is now very poor. Had 
there been differentiation among those scor- 
ing zero, we might even find some relation- 
ship between scores in that range and the 
criterion. It would be more reasonable to 
expect the relationship to appear among 
scores at the upper levels, however, where 
none was found. Since the reasoning con- 
cerning the contributions to variance in the 
“?” score indicates several possible traits, a 
factor analysis of the score is definitely called 
for as a basis for intelligible future predic- 
tions. 


Summary and Conclusions 


An “indecision” score was obtained by 
counting the number of “?” responses to 
items in the Guilford personality inventories 
STDCR, GAMIN, and Personnel Inventory. 
The three scores showed an average intercor- 
relation of .80, indicating that they measure 
much the same trait or traits. A combination 
of these three scores correlated + .14 and 
— .09 with a rating of proficiency of fore- 
men in an industrial plant, whereas a signifi- 
cant negative correlation had been predicted. 
While the indecision score indicates some- 
thing stable about individuals, it needs to be 
factor analyzed to be understood and test 
conditions that will assure better discrimina- 
tions at the lower levels are needed for a 
score of practical use. 


Received September 14, 1953. 
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Many currently-employed indices of on-the- 
job performance do not adequately measure 
job success, because the original job descrip- 
tions do not clearly depict the psychological 
requirements of the job being studied. As 
research progresses, it is becoming increas- 
ingly obvious that the usual types of job de- 
scriptions are neither rigorous nor analytic 
enough to furnish a sound basis for the de- 
velopment of valid measuring devices. There 
is a need for new methods by means of which 
the basic dimensions of job success can be 
isolated and precisely described. 

The present study is the first of a series de- 
signed to investigate the possibility of deriv- 
ing meaningful job requirements by statistical 
rather than by “rational” analysis. The re- 
search plan calls for factor-analyzing de- 
scriptions of on-the-job behavior obtained by 
interviewing the peers and supervisors of se- 
lected Air Force Airplane and Engine Me- 
chanics. 

This procedure was guided by the follow- 
ing working hypotheses: 


1. Peers and supervisors can select repre- 
sentatives of three categories of me- 
chanics, viz., best, average, and poorest. 
Descriptions of representatives of these 
three categories will reflect individual 
differences in psychological variables re- 
lated to job proficiency. 

. Factor analysis of the descriptions will 
assist in understanding some of the psy- 


! This study was supported in part by the United 
States Air Force under Contract AF 33(038)-25726, 
monitored by the Commanding Officer, Human Re- 
sources Research Center, Attention: Director of Op- 
erations, Lackland Air Force Base, San Antonio, 
Texas. Permission is granted for reproduction, pub- 
lication, use and disposal in whole or in part by or 
for the United States Government. 

2The authors wish to express appreciation to 
Charles Baldwin, Charles N. Cherry, and K. Patricia 
Cross for their assistance in collecting and editing the 
interview materials, to Walter A. Cleven and Mal- 
colm M. Helper for carrying out the statistical analy- 
ses, and to Donald R. Shaw for assistance in the 
factor interpretation. 
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chological characteristics related to job 
proficiency. 

. Ability test items can be prepared which 
measure these psychological character- 
istics; individual differences in responses 
to these items will be related to criteria 
of job proficiency. 


If these hypotheses are to prove fruitful, 
the following conditions must be met: (a) 
the descriptions of “best” and “poorest” me- 
chanics should differ significantly; (b) the 
factors deriving from the descriptions should 
be meaningful; (c) the descriptive factors 
should be related to independent criteria of 
job performance, such as their rated job pro- 
ficiency by other informants; and (d) the 
use of the factors as guides in preparing 
ability items should result in tests which are 
more highly related to criteria of proficiency 
than those prepared exclusively by way of the 
job description approach. The first two con- 
ditions and a preliminary investigation of the 
third form the basis of the present study. 
More thorough investigation of the last two 
conditions will follow later, provided of course 
that the results of the present set of studies 
justify it. 


The Descriptive Inventory 


The present paper reports: (a) the prepa- 
ration of an inventory, called the Descriptive 
Inventory, designed to facilitate the descrip- 
tion of mechanics by their peers and su- 
pervisors; (b) a factor analysis of the results 
obtained when this inventory was used by 
supervisors to describe individuals whom they 
had selected as representative of “best,” “av- 
erage,” or “poorest” mechanics; and (c) an 
analysis of the relations of the items to “best” 
and “poorest” mechanics. 


To obtain items for the inventory, experienced 
mechanics, most of whom had been in super- 
visory positions, were asked to select the “best” 
(or “average” or “poorest”) A, & E. mechanic 
with whom they had worked within the last two 
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years. A description was then sought of the be- 
havior of this mechanic both on and off the job, 
and the descriptions were subsequently divided 
into separate descriptive phrases. This plan was 
followed because it was believed that an inven- 
tory constructed in terminology familiar to main- 
tenance personnel would be used with more dis- 
crimination by mechanics than would one com- 
posed of more academic and technical phrases. 

Subjects and Procedure. A total of 104 stu- 
dents attending Flight Engineering School at 
Chanute Air Force Base served as subjects for 
the initial phase of this study. Each subject was 
individually questioned in an interview divided 
into two separate sections: (a) a free-response 
phase, in which the subject was asked simply to 
describe a fellow mechanic selected by him to 
represent one of the three categories of profi- 
ciency; and (b) a “structured” phase in which 
comments were elicited in response to specific 
questions asked by the interviewer. 

The Descriptive Phrases. To facilitate draw- 
ing of items from the interview protocols, each 
separate and complete idea (which in most in- 
stances could be represented by a single phrase) 
was demarcated from every other idea. This 
was done for the structured as well as the free- 
response portion of the interviews. 

These descriptive phrases thus obtained were 
extracted from the typescripts of the recorded 
interviews and assembled into a pool, which 
numbered in all some 15,000 items. From this 
pool, 264 items were selected on a random basis 
to serve as the raw material of the inventory. 
Each of these 264 items was then edited and re- 
viewed independently by three psychologists and 
five mechanics to the end that: (1) the ideas ex- 
pressed were always those of the interviewee; 
(2) each phrase was in the present tense; (3) 
items on which at least two of the three judges 
could not agree (in terms of meaning, wording, 
etc.) were eliminated as ambiguous; and (4) 
phrases whose meanings were dependent upon 
context were rewritten to make their meanings 
clear when used in isolation. 


First Pilot Study 


Upon completion of the editing, 235 items 
remained out of the original sample of 264. 
These items were assembled into an experi- 
mental inventory in which each item re- 
quired either a “yes” or a “no” answer. The 
inventory was administered to Air Force su- 
vervisors in order to: (a) obtain comments 
as to the meaningfulness and adequacy of 
coverage of the phrases; and (b) to secure a 
preliminary indication of the predictive utility 
of the phrases. To fulfill this latter aim, chi 
square values were computed for each item 
in order to determine whether or not it dis- 


Louis L. McQuitty, Charles Wrigley, and Eugene L. Gaier 


criminated significantly between mechanics 
selected as “best” and “poorest.” 


After these data had been obtained, the items 
were considered one at a time with respect to: 
(a) the mechanics’ comments; (b) the magni- 
tude of the chi square values; and (c) the pro- 
portion of subjects answering each answer al- 
ternative. Three judges decided whether to re- 
tain, amend, or reject items. Items with a 90% 
or more response for either answer alternative 
were rewritten to lessen this percentage when- 
ever this appeared possible; otherwise they were 
rejected. Items which supervisors reported to 
require information that they did not have were 
rejected, and those regarded as difficult to un- 
derstand were amended. 

In all, 35 items were eliminated in this phase 
of the study. The 200 remaining phrases were 
assembled in random order into a check list 
designated as the Descriptive Inventory. Al- 
though the entire inventory was used in the col- 
lection of data, only the first 120 items were 
analyzed in this study, for reasons stated later. 
Examples of the items are listed in Tables 1-2. 


Use of the Descriptive Inventory 


Our next immediate purposes were: (a) to 
isolate relatively independent clusters of in- 
terrelated descriptions; (b) to interpret these 
psychologically; and (c) to make a prelimi- 
nary investigation of their relation to job 
proficiency. 


Subjects. The Descriptive Inventory was ad- 
ministered to 428 Flight Engineering students at 
Chanute Air Force Base. Each subject had 
completed a course in Airplane and Engine Me- 
chanics and had had at least six months of su- 
pervisory experience. In length of line main- 
tenance experience the subjects ranged from six 
months to more than 21 years, with a median of 
four years. 

Administration. The inventory was adminis- 
tered in small group sessions (12 to 25 men) of 
about 30 minutes in length. The instructions 
printed on the face sheet of the booklets were 
read to the subjects before they began work on 
the inventory. All of the respondents were given 
ample time to complete the items. 

Method of Analysis. In order to reduce com- 
putational labor, the Descriptive Inventory was 
divided into two parts for factor analysis by 
the shortened square root method developed by 
Wrigley and McQuitty * (a modification of Thur- 
stone’s diagonal method). In the present pa- 
per, results are reported for the first 120 items 
only. (Since the 200 items in the Inventory 


8 Wrigley, Charles and McQuitty, Louis L. The 
Square Root Method of Factor Analysis: A Reex- 
amination and a Shortened Procedure (Manuscript). 
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were arranged in random order, there should be 
no significant difference in the type of items ap- 
pearing in the two parts.) Phi coefficients were 
used to measure correlations; these and the fac- 
tor loadings were calculated on IBM _ equip- 
ment, using punched card methods developed by 
Helper.‘ 

In the square root factor analysis, a “pivot 
variable” was selected and factor loadings were 
calculated to reduce the correlations or residual 
correlations for that variable to zero. The pivot 
variables were selected with the aim of enhanc- 
ing the likelihood of getting predominantly posi- 
tive factor loadings, and of obtaining factors 
which are conceptually clear. The pivot variable 
was always the one with the highest absolute col- 
umn sum. The same procedure is then repeated 
with another “pivot variable.” The method re- 
sults in orthogonal factors, with all factor axes 
at right angles to one another. 


In addition to the factor analysis, a pilot 
validity analysis was also completed. Using 
only the 204 inventories which described 
“best” and “poorest”? mechanics, phi coeffi- 
cients were computed between each item and 
the best-poorest dichotomy. These phi co- 
efficients are here called criterion correlations. 

A total of 18 faciors were extracted from 
the 120 variable matrix. By this time, the 
point of decreasing returns appeared to have 
been reached, as shown by the drop in rela- 
tive proportions of variance accounted for by 
the 15th, 17th and 18th factors; moreover, 
the factors became less obvious in psycho- 
logical meaning. 

In order to insure that no major group fac- 
tors had been omitted, a list was prepared of 
all the items not appearing within the ten high- 
est loadings of any one of the first 18 factors. 
Further pivots were drawn from this reduced 
list. This procedure was designed to guaran- 
tee that some axes passed through that por- 
tion of the hyperplane which had not previ- 
ousiy been traversed. The value of the pro- 
cedure in the present study was demonstrated 
by the fact that the next factor—the 19th in 
order of extraction—proved to be the sixth 
in the order of variance. The next four were 
less encouraging. Consequently, the factor- 
ing was discontinued at this point. This 
made a total of 23 factors extracted. 


4Helper, M. M. Punched-card procedures for 
square root factor analysis (Manuscript). 


Item Validities 


It will be of interest first to consider the 
phi coefficients between the individual items 
and the “best-poorest” dichotomy for the 204 
mechanics classified in this fashion. These 
criterion correlations range in magnitude from 
87 to .0l with a mean of .48. Although 
these results show some very substantial rela- 
tionships between the descriptive items and 
the “best-poorest” criterion, they cannot, of 


Table 1 


Descriptive Inventory Criterion Correlations for Items 
with High Predictive Value (@ > .70) 


Phi 
Coefficient Item 
Items characteristic of good mechanics 
868 He makes sure he does a good job 
835 When he does a job you know it will 

be done right 

812 He deserves a promotion 

800 If you leave him to do a job, you can 
always be sure he will get the job 
done 
He is good at working on the plane 
He can show you how to do the job 
right 
He is a good man on any job 
He knows his stuff 
He seems to take pride in his work 
You don’t have to worry about tell 
ing him what to do all the time 
He tries to find better ways of doing 
things 
His ambition will pay off 

721 He gives good cooperation 

712 He will straighten a guy out and ex 
plain things to him 

703 If he were a crew chief, he would 
work right along with his men 


Items characteristic of bad mechanics 
—.778 You wouldn’t feel safe unless you 
checked behind him 
753 Most guys with that much experi- 

ence know a lot more than he does 
750 He works in a sloppy way 
748 He isn’t a very careful worker 
He achieves his aim in the wrong 
way 
He is kind of slipshod in his ways 
He doesn’t have any sense of respon 
sibility 
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course, be accepted as final evidence that the 
descriptions are related to proficiency on the 
job. They indicate, however, the features re- 
garded by the supervisors as significant. 

Items with highest criterion correlations 
(see Table 1) are mostly generalized descrip- 
tions of behavior on the job, e.g., “He makes 
sure he does a good job”; “When he does a 
job, you know it will be done right.” These 
characterizations, however, give little detailed 
information as to the psychological com- 
ponents of job success. The advantage in 
carrying out a factor analysis of the items is 
that more analytic dimensions are thus de- 
veloped. 

The items with the lowest criterion correla- 
tions (see Table 2) are less job centered in 
their orientation, and deal with such traits as 
drinking habits, social demeanor, truthfulness, 
appearance, etc. 

The items with high criterion correlations 
agree for the greater part with the items with 
high loadings on the first factor. This is par- 
ticularly evident if the highest 20 items in 
each instance are considered. Sixteen items 
are common to both the lists of the 20 
highest factor loadings in the first factor and 
the 20 highest (absolute) correlations with 
the criterion. The main differences are: (a) 
the order of appearance of the items is some- 
what changed in the two lists; (b) the high- 
est loadings on the first factor stress the ele- 
ments of cooperation and dependability, but 
the corresponding criterion-correlation items 
are even more generalized, saying little more 
than that the mechanic does a good job. 

The advantages in making the factor analy- 
sis are thus clearly seen. If the study had 
been restricted to criterion correlations, there 
would have been no concise account of differ- 
ent psychological components related to the 
pilot criterion. The function of the factor 
analysis is to aid in identifying some con- 
stituents which are involved in the descrip- 
tions of mechanics. 


Interpretation of Factors 


As computational methods improve and the 
analysis of more variables for larger numbers 
of subjects becomes practicable, factor ana- 
lysts will probably become accustomed to 
dealing with smaller loadings. In this study, 
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Table 2 
Descriptive Inventory Criterion Correlations for Items 
with Low Predictive Value (@ < .20) 


Item Phi 
No. Coefficient 


Item 


3 wt 
55 026 


He is of just average appearance 

If there is something he likes to do, 
he does it faster than anyone else 
would 

He doesn’t lose his temper 

He would give the shirt off his back 
He associates with fellows like him- 
self 

He is of above average appearance 
He is quite young for his rank 

He doesn’t drink 

He likes to drink on his off-duty 
hours 

Sometimes he gets “T’d off” 

His basic training was rather short 
We have had a couple of “run-ins” 
He appears very rude to a stranger 
He hasn’t had too much time in 
Service 

He doesn’t care to mix with other 
people 


loadings as low as .10 usually appear to be 
quite meaningful, and not at all inconsistent 
with the general interpretation of the factor. 
The twelve factors accounting for the most 
variance are reported here. The sums of 
squares of loadings for each of the factors 
are presented in Table 3.° 

Tables 4 through 15,° one for each of the 
twelve factors, report (a) the items with the 
ten highest loadings for each factor, the posi- 
tive loadings first, followed by the negative 
ones; (b) the interpretation of each factor; 
(c) the phi coefficient between the subjects 


In the use of the square root method, the larger 
factors tend to, but do not necessarily, appear before 
the smaller. In presenting results here, the factors 
have been rearranged in order of contribution to 
variance, and to conserve space, smaller factors have 
not been reported. The factor loadings are on file 
for all 23 factors at the Training Research Labora- 
tory, University of Illinois. 

® Tables 4-15, and 16 and 17 have been deposited 
with the American Documentation Institute. Order 
Document No. 4248 from the ADI Auxiliary Publi- 
cations Project, Photoduplication Service, Library of 
Congress, Washington 25, D. C., remitting in ad- 
vance $2.25 for 35 mm. microfilm or $5.00 for 6 X 8 
in. photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 
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Table 3 


Square Root Factor Analysis: Sums of Squares of Factor Loadings 


Sum of 
Squares of 
Factor 
Loadings 
24.7028 

2.8732 
2.3592 
2.3522 
2.0080 
1.9102* 
1.8269 
1.7407 

S) 1.6646 
14 1.6644 
7 1.6302 
13 1.5756 
23 1.5530* 


Factor in 
Order of 
Extraction 


Factor in 
Order of 
Size 


Sum of 
Squares of 
Factor 
Loadings 
1.5272* 
1.4942 
1.4780* 
1.4096 
1.3614 
1.2952 
1.2469* 
1.1953 
1.1584 
1.0812 
61.1084 
50.92% 


Factor in Factor in 
Order of Order of 
Size Extraction 
14 22 
15 8 
16 20 
17 3 
18 7] 
19 i6 
20 21 
21 17 
22 18 
23 15 
Total sum of squares 
Contribution to variance 


* Pivots for these factors were selected from a reduced list of variables, viz., those which had hitherto not 


carried very high loadings on any factor. 


selected as “best” or “poorest” and the item 
response; and (d) the code number of the 
item in the inventory. Loadings as low as 
.10 were accepted here because of the large 
number of both variables and subjects (N = 
428). 

Table 16 lists all 120 items of the Descrip- 
tive Inventory. Table 17 gives the 12 factor 
loadings for each of the 120 items. Tables 
16 and 17 are also deposited in ADI. (See 
footnote 6.) 


Relation of Factors to Criterion 


The problem remains as to whether all fac- 
tors described by the mechanics are related 
to the criterion. Results may be summarized 
by presenting the average criterion correla- 
tion for the 10 items with highest loadings in 
each factor. Those factors which appear to 
be measuring somewhat the same area of be- 
havior have been grouped together. 

These results appear quite clearcut. The 
drive and initiative shown by the mechanic, 
on the one hand, and his practical efficiency, 
on the other, are most closely related to the 
pilot criterion. The seven factors which are 
grouped under these two headings have the 
highest average for criterion correlations. The 
factors dealing with social manner of the me- 
chanic, his interest in aircraft and in the Air 


Average 
Criterion 
Correlation 


Factor 
No. Factor Title 
Aspects of drive and initiative. 
1. Sense of responsibility 
3. Willingness for work 
4. Laziness 
8. Industriousness 
Aspects of practical efficiency. 
6. Failure to use knowledge 
effectively 71 
14. Practical workmanship .66 
16. Lack of craftsmanship .70 
Aspects of knowledge and intellectual powers. 
7. Teaching capacity 41 
9. Memory 55 
45. Intellectual capacity 48 
21. Job knowledge A7 
Aspects of social manner. 
12. Social acceptability 
a. Personal pleasantness 
19. Anti-sociability 
Aspects of interest and morale. 
2 Interest in aircraft main- 
tenance 
13 Lack of morale 


Aspects of character. 

5. Weakness of character 
10. Self-control 

20 Lack of self-control 


Other factors. 
11 Inexperience 
18 Tendency to mediocrity 
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Force, and his intellectual powers are less 
highly related to the criterion. His drinking 
and money habits, and his ability to control 
his temper have little or no relation here. 

The factors which account for more vari- 
ance tend to be more highly related to the 
pilot criterion, as shown by the fact that the 
rank-order correlation between factor vari- 
ance and mean criterion correlation for the 
factors, using all 23 factors, is .51. 


Discussion 


Interest and Motivation in the Supervisors’ 
Accounts. In discussing these results, we 
must bear in mind that these were the ratings 
made by supervisors, and their judgments 
may reflect their own conceptions rather than 
actual job performance. In terms of their 
judgments, interest and motivation appear as 
the principal factors in the descriptions of 
“best,” “poorest,” and “average” Aircraft and 
Engine Mechanics. Primarily, the mechanic 
is described as to: (a) whether he is coopera- 
tive and can be depended upon to get the job 
done; and (b) whether he likes being a me- 
chanic and working on aircraft. The me- 
chanic’s behavior is reported more frequently 
than what he knows. Little is said about his 
technical knowledge; and “poorest” mechan- 
ics are frequently described as disinterested 
or lazy rather than stupid or inadequately 
trained. The overall picture of the good me- 
chanic is one of being responsible and being 
willing to work and learn a few simple things, 
rather than of any extensive knowledge of the 
principles of mechanics. 

The Place of Mechanical Information. All 
Aircraft and Engine Mechanics are supposed 
to possess at least a minimal amount of tech- 
nical knowledge. Presumably some mechani- 
cal learning is necessary if a man is to be able 
to service an aircraft, but this study has re- 
vealed neither the nature nor the amount of 
this basic information that is required of the 
successful mechanic. Supervisors make little 
reference to lack of this, even in their de- 
scriptions of bad mechanics. Hence, we may 
assume that: (a) the amount needed is less 
than has generally been considered to be the 
case; (b) the Air Force is highly successful 
in giving to all recruits who pass through 
technical training school the groundwork of 
knowledge which is prerequisite to satisfac- 
tory job performance; or (c) the supervisors 
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neglected in their descriptions a significant 
characteristic in which mechanics differ. In 
any case, the restriction of this study to men 
already trained minimized the importance of 
mechanical knowledge, and consequently em- 
phasized differences in motivation and per- 
sonality. Even if differences of interest and 
willingness do not give the whole story, this 
factor analysis has made abundantly clear 
that they are, at least, the primary variables 
in the descriptions by supervisors of mechan- 
ics whom they selected to represent different 
levels of efficiency. In other words, the re- 
sults support the hypothesis that getting good 
Aircraft and Engine Mechanics is not entirely 
a problem of accumulating knowledge; it is, 
at least in part, a matter of motivation, in- 
terest, and morale. 


Summary 


Before additional tests designed to predict 
success of mechanics are written, specific hy- 
potheses are needed outlining the dimensions 
which enter into job proficiency. The present 
study attempted to isolate some of these hy- 
potheses by: (a) obtaining descriptions from 
supervisors, in their own words, of Airplane 
and Engine Mechanics (selected to vary in 
proficiency); and by (b) factor-analyzing a 
compendium of these descriptions. A square 
root factor analysis of 120 of these descrip- 
tions resulted in the following hypotheses, for 
further study. 

1. There are a large number of rather inde- 
pendent dimensions of behavior related to job 
proficiency. Of the 23 factors extracted, 
practically all of these were found to be re- 
lated to differences in mechanics selected as 
representative of “best” and “poorest” job 
performers by the supervisors who described 
them. 

2. The six most clearly defined of the 23 
dimensions were asserted to be: (a) sense of 
responsibility; (b) interest in aircraft main- 
tenance; (c) willingness for work; (d) lazi- 
ness and lack of initiative; (e) weakness of 
character; and (f) failure to use knowledge 
effectively. 

It was concluded that supervisors describe 
trained mechanics who are selected by them 
to vary in proficiency much more in terms of 
interest and motivation than in terms of the 
amount of job knowledge possessed. 


Received August 6, 1953. 
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There has been a growing recognition within 
the military services of the potential useful- 
ness of job evaluation for various purposes. 
Military job evaluation, for example, can con- 
tribute to the more adequate differential 
qualitative allocation of personnel among the 
services, to the development of career pro- 
grams, and to the equalization across the 
services of grades and ranks for comparable 
types of responsibilities and duties. 

The present investigation is a pilot study 
relating to the job” evaluation of enlisted 
jobs within the United States Navy. Specific 
purposes of the study were those of identify- 
ing the factors which contribute to differences 
in job values, and of determining the relative 
importance of each such factor. 


1 This article is based on a study carried out by 
the Occupational Research Center, Purdue Univer- 
sity, under the provisions of a research contract be- 
tween the Office of Naval Research and the Purdue 
Research Foundation (Contract No. N7onr-39410) 
The views expressed herein are those of the authors 
and do not necessarily represent the views of the 
Navy Department. 

The authors wish particularly to express their ap- 
preciation to Mr. D. G. Price, Chief, Billet and 
Qualifications Research Branch, Personnel Analysis 
Division, Bureau of Navy Personnel, for his cordial 
cooperation in making arrangements for many phases 
of this investigation. 

*The term “job” does not have a specific con- 
notation in the Navy as it does in industry and busi- 
ness, but is used in this article for purposes of con- 
venience of terminology. Because of the nature of 
shipboard operations, an enlisted man must perform 
several different sets of duties at different times un- 
der different conditions that are involved in operat 
ing and fighting a ship. Thus, an enlisted man usu- 
ally has certain “routine” duties that he performs 
(these are the regular duties of the individual in 
tasks that are related to his Navy rating); he also 
has certain “Watch, Station, and Quarter Bill” as- 
signments that he performs under specific shipboard 
conditions, as for example during emergencies or dur- 
ing specified operations. The study reported in this 
article was based largely on what might be thought 
of as the “routine” duties of enlisted personnel. 


Experimental Procedures 


The experimental procedures basically in- 
volved the identification, for one representa- 
tive sample of enlisted jobs, of the factors 
(and of their statistically determined weights) 
which gave the optimum degree of relation- 
ship with criterion values, and the cross vali- 
dation of the results with a second repre- 
sentative sample of jobs. It was hypothesized 
that if a particular collection of factors, with 
their appropriate weights, would predict job 
values with two independent samples, a job 
evaluation system structured on such results 
would be of general applicability to the en- 
tire population of enlisted naval jobs. The 
criterion consisted of rankings of the jobs by 
experienced naval personnel on over-all diffi- 
culty and responsibility. The evaluations of 
the jobs on the various factors were made by 
Navy job analysts. 


The Samples of Jobs 

For the purposes of the investigation the 
“population” of naval jobs was considered to be 
those defined in The Manual of Enlisted Navy 
Job Classifications (4).* 

Job Dimensions Considered. In order to ob- 
tain representative samples, the following four 
job dimensions were considered: (1) Job group 
(14 groups representing different areas, such as 
quartermaster jobs, electronics jobs, etc.);* (2) 


‘The following groups of jobs were excluded from 
consideration: exclusive emergency service jobs (jobs 
of relatively restricted scope that typically exist as 
such only under conditions of full mobilization) ; 
jobs applicable to more than one rating; and spe- 
cialists (job specialties, such as divers, for which 
some individuals are qualified, and which they may 
be called upon to perform now and then in addition 
to their regular duties). The total population of 
jobs after these exclusions was 825. 

$’ The Manual of Enlisted Navy Job Classifications 
classifies jobs into 12 major groups. For the pur- 
poses of this investigation, however, they were di- 
vided into 14 groups 
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Job levels (three levels, namely: Basic, Journey- 
man, and Supervisory); (3) Branch of service 
(Aviation versus Non-aviation); and (4) Job lo- 
cation (where the job typically occurs namely: 
Shore, Shipboard, and Shipboard-Shore).® 

Selection of the Two Samples. The experi- 
mental and hold-out samples were then individu- 
ally so selected that each sample included per- 
centages in each category of the four dimensions 
which approximated the corresponding percent- 
ages in the total population. For the later pur- 
pose of deriving criterion values, these two sam- 
ples were combined, making a total of 103 jobs. 
Sixteen “extra” jobs were then added to these 
samples for purpose’ to be described later, mak- 
ing a total of 119 jobs. 
The Criterion 

The “validation” of industrial job evaluation 
systems usually is in terms of the extent to which 
a given system results in a satisfactory degree of 
relationship with prevailing wage or salary levels. 
Since naval pay for various grades and ranks is 
established by legislative enactment rather than 
by labor market “supply and demand” factors, a 
different type of criterion of job values was nec- 
essary. For this purpose twenty-nine representa- 
tives of the naval service served as judges in 
ranking the selected jobs on the basis of over-all 
difficulty and responsibility. 

Instructions to Judges. Each judge attended 
a meeting at which the purposes of the study 
and the rating procedures were discussed. Each 


judge was given a packet containing definitions 
(on separate sheets) of the 119 jobs, and a 


set of instructions. These instructions asked the 
judge to select the sheets of those jobs which he 
felt he could rank in relationship to other jobs, 
and then to rank these jobs on.the basis of 
“over-all job difficulty and responsibility.” 


Evaluation of Sample Jobs 

The Experimental Job Evaluation System. An 
experimental job evaluation system with 13 fac- 
tors was set up for later use in evaluating the 
sample jobs.* Definitions of all factors were in- 
corporated in the experimental system. 


® This determination was made by analysts in the 
Bureau of Naval Personnel. 

® This system included the following factors: (1) 
Work Knowledges Required*; (2) Inherent Job 
Hazards; (3) Guidance and/or Supervision Re- 
ceived*; (4) Responsibility for Supplies and Equip- 
ment*; (5) Non-hazardous Working Conditions; 
(6) Physical Effort Required*; (7) Responsibility 
for the Safety of Others*; (8) Guidance, Supervisory 
and Command Responsibility*; (9) Potential Com- 
bat Hazards and Hardships; (10) Physical Skill*; 
(11) Mental Demand; (12) Military and Working 
Conditions*; (13) Attention. Eight of these fac- 
tors (those marked with an asterisk) were the same 
as those tentatively being considered for use in a 
service-wide system that was developed in connec- 
tion with the Military Occupational Classification 
Program of the Personnel Policy Board, Depart- 
ment of Defense. 
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The Job Analysts Who Served as Evaluators. 
The jobs were evaluated by experienced job 
analysts in the naval service. The experimental 
and holdout samples were evaluated, at different 
times, by 32 and 13 analysts respectively; 11 job 
analysts were included in both groups and evalu- 
ated both samples. 

Method of Evaluation. Each analyst was 
given definitions of the jobs in the sample in 
question, a set of definitions of the thirteen ex- 
perimental factors, a set of instructions, and a 
set of record sheets. The instructions provided 
for the analyst first to select, from the sample 
in question, the definition sheets of the jobs 
which he felt he could rank relative to other jobs 
on the various factors. Instructions provided 
then for ranking these selected jobs on each of 
the 13 factors. For this purpose the “rank- 
comparison” system described by Bittner and 
Rundquist (1) was used. This system provides, 
in general, for the division of the items into sub- 
groups, the ranking of the items within each sub- 
group, and the subsequent “merging” of the sub- 
groups. 


Results 
I. Criterion Scale Values 


In order to derive scale values for the 103 
original sample jobs as such, the rankings by 
the judges of the 16 “extra” jobs were disre- 
garded; this was done for each judge by as- 
signing ordinal rank orders to the sample 
jobs in the order in which he ranked them, 
exclusive of any of the “extra” jobs which he 
had also ranked. 

Original Criterion Scale Values. Since each 
judge ranked only part of the sample jobs, it 
was necessary to take this into account in de- 
riving criterion scale values. A method de- 
scribed by Guilford (3, pp. 256-257), appro- 
priate to such situations, was used. These 
scale values, numerically, ranged from 0 
(high) to 3.2037 (low). 

Consistency of Rankings by Judges. In 
order to get an estimate of the consistency of 
the rankings of each judge with the rankings 
of the entire group of judges, a rank order 
correlation (rho) was computed for each 
judge between the rank order of the jobs he 
ranked and the rank order of those same jobs 
in the complete array, when the scale values 
of the jobs he ranked were put into ordinal 
rank sequence. These rank order correlations 
ranged from .60 to .94, with a median of .86. 
While all of these rhos were statistically sig- 
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nificant (at better than the one per cent con- 
fidence level), it was decided, in the interests 
of criterion stability, to drop the rankings of 
the three judges with the lowest critical ratios. 
Subsequent analyses were made on the basis 
of the rankings of the remaining 26 judges; 
their median rho was .87. 

While such consistency does not provide 
demonstrable proof of the validity of the 
judgments of job values, it lends support for 
the use of such judgments as a criterion, in 
the absence of any “true” criterion of naval 
job values. 

Final Job Samples. 1n addition to analyz- 
ing the reliability of the criterion rankings 
among the several judges, an analysis was 
also made of the consistency with which the 
individual sample jobs were ranked. While 
this analysis will not be described in de- 
tail, suffice it to say that the 15 jobs that 
were judged with the least consistency were 
dropped from the samples, and were replaced, 
where possible, with “extra” jobs with similar 
dimension characteristics. Such replacements 
were not possible for all the jobs dropped, 
however, and the two samples were reduced 
to 58 and 37 jobs respectively. The jobs in 
these two groups (giving a total of 95) were 
the ones used later in the analysis of the 
various job evaluation factors. 

Final Criterion Scale Values. Criterion 
scale values were then recomputed using these 
95 jobs as ranked by the 26 judges men- 
tioned above. 


II. Analysis of Factor Evaluations on Ex- 
perimental Jobs 


Tentative Rank Orders on Individual Fac- 
tors. The factor rankings of the job analysts 
were used first in deriving tentative rank or- 
ders of the 58 experimental jobs on each of 
the 13 factors. The method presented by 
Guilford (3, pp. 256-257) for use in deriving 
scale values from several sets of incomplete 
rankings involves the intermediate computa- 
tion of “probability” values. These prob- 
ability values have the same rank orders as 
do the final scale values, and were therefore 
used as the basis for determining these tenta- 
tive rank orders. 

Reliability of Evaluation by Job Analysts. 
The following reliability analysis was made 
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individually for each of the 32 job analysts. 
The jobs which each analyst ranked were first 
extracted from the complete array of the 
tentative rank orders on each factor; these 
selected jobs were then assigned ordinal rank 
orders on each factor in the sequence in which 
they had been extracted. For each factor a 
rank order correlation (rho) was computed 
between the ordinal rank orders of the jobs 
which the analyst ranked as extracted from 
the complete array of jobs, and the rank or- 
ders of those same jobs as he had ranked 
them on the factor in question. 

The rhos for each analyst on the 13 fac- 
tors were then converted to Fisher's z values. 
The 13 z values for each analyst were then 
averaged, and these averages were then re- 
converted to rho correlations. These aver- 
age rhos ranged from .60 to .89 for the vari- 
ous analysts, with a mean and median for all 
analysts of .81. 

The average rhos for the individual ana- 
lysts were then subjected to a. statistical 
analysis to determine the extent to which 
they differed from those of the other ana- 
lysts. The seven analysts whose average 
rhos differed most from those of the remain- 
ing analysts were considered as candidates 
for being dropped for the subsequent analy- 
ses. Four of these analysts were dropped. 
The other three, however, had each ranked 
certain jobs which in turn had been ranked 
by limited numbers of other analysts; it was 
therefore considered desirable to retain the 
evaluations of these three analysts. The av- 
erage rhos of the 28 analysts retained ranged 
from .71 to .89, with a mean (computed from 
Fisher’s z values) of .82. While these values 
should be considered as being approximations 
rather than as precise indexes of the reliability 
of the analysts, the general level of the re- 
liability compares rather favorably with that 
which is typically obtained in industrial job 
evaluation studies. 

The rhos of the 28 analysts for each of the 
13 individual factors were then averaged, 
using Fisher’s z values. These average rhos 
ranged from .64 to .88 for the various fac- 
tors, with an average rho for all factors of .82. 

Final Scale Values on Individual Factors. 
The rankings of jobs on the 13 factors by the 
28 job analysts were used as the basis for de- 
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Table 1 


Correlations of Factor Scale Values with Criterion Scale 
Values for Final Experimental Sample 


Factor 
No. r 


954 


Factor Name 

Work Knowledges Required 

193 Inherent Job Hazards 
—.854 Guidance and/or Supervision Received 

631 Responsibility for Supplies and Equip- 
ment 
Non-hazardous Working Conditions 
Physical Effort Required 
Responsibility for the Safety of Others 
Guidance, Supervisory and Command 
Responsibility 
.048 = Potential Combat Hazards and Hard 
ships 
Physical Skill 
Mental Demand 
Military and Working Conditions 
Attention 


— 033 
— 139 
248 
629 


420 
756 
O89 
505 


riving final scale values for the 58 experimen- 
tal jobs on each of the factors. The previ- 
ously mentioned method for developing scale 
values from several sets of incomplete rank- 
ings was used. 

Relationship of Factor Scale Values to Cri- 
terion Scale Values. Table 1 presents the 
correlations of the scale values on each of 
the 13 factors with the criterion scale values. 
These individual correlations range from .954 
to minus .854. 

In order to determine the factors and their 
weightings which gave the optimum degree of 
relationship with the criterion scale values, 
the Wherry-Doolittle test selection method 
was used. This method is described by Gar- 


rett (2, pp. 435-558). The results of this 
analysis are given in Table 2. This table 
shows the factors in the sequence in which 
they were selected, including the shrunken 
multiple correlation (R) with the criterion for 
each of the selected factors along with all 
previously selected factors. This table also 
gives the Beta weights and the subsequently 
derived “b” weights for the individual fac- 
tors selected. The first five factors selected 
gave an R of .968. The addition of the sixth 
factor caused no increase in the R, indicating 
that the first five factors by themselves gave 
the optimum degree of relationship with the 
criterion scale values. The unshrunken multi- 
ple correlation (R) of these five factors with 
the criterion was .970. 

It will be observed that Factor 1 (Work 
Knowledges Required) by itself gave a cor- 
relation with the criterion of .954, indicating 
that this single factor accounted for a very 
high proportion of the variance in criterion 
scale values. Factor 3 and Factor 2 entered 
into the prediction of criterion values with 
negative weightings. The negative relation- 
ship of Factor 3 (Guidance and/or Super- 
vision Received) is readily understandable, 
but it is interesting to note that this factor 
“came through” while Factor 8 (Guidance, 
Supervisory and Command Responsibility) 
did not. The inverse relationship of Factor 2 
(Inherent Job Hazards) is consistent with 
the typical findings of industrial job evalua- 
tion studies. 


III. Cross Validation with Hold-out Jobs 


Derivation of “Predicted” Criterion Scale 
Values Using Selected Factors. The five fac- 


Table 2 
Shrunken Multiple Correlations with Criterion of Factors in Order of their Selection, with Beta 
and b Weights, for Final Experimental Sample 


Factor Name 


Work Knowledges Required 

Guidance and/or Supervision Received 
Potential Combat Hazards and Hardships 
Inherent Job Hazards 

Responsibility for the Safety of Others 
Physical Effort Required 


Factor 
No. R 


Beta b 


Weight 

1 .954 .7664 

3 : — .2443 — .2505 
.964 0645 .0631 
.966 — .1242 
.968 
.968 


Weight 
.7578 
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tors identified as giving the optimum degree 
of relationship with criterion values of the ex- 
perimental jobs were used in computing “pre- 
dicted” criterion values of the 37 hold-out 
jobs. The first step involved in this process 
was that of computing scale values for the 37 
hold-out jobs on each of the five factors, using 
the method previously described. The “b” 
weights for these factors were then incorpo- 
rated in a regression equation, along with the 
derived constant (K = .2418) in order to ob- 
tain the predicted criterion values. 
Correlation between Predicted and Actual 
Criterion Values. The predicted criterion 
scale values for the 37 hold-out jobs were 
then correlated with their previously deter- 
mined actual criterion scale values. This 
correlation was .937. This correlation is of 
such a magnitude that it gives assurance that 
the selected factors account for a very sub- 
stantial proportion of the criterion variance. 


Conclusions 


The following conclusions seem warranted 
on the basis of the results of the investiga- 
tion: 

1. The criterion ranking of the sample jobs 
by a number of representatives of the naval 
service reflect fairly stable concepts among 
them with respect to relative values of en- 
listed naval jobs; in the absence of any “true” 
criterion of naval job values, such reliable 
judgments may well be accepted as a cri- 
terion for use in job evaluation research. 

2. The rankings of the sample jobs by job 
analysts on the 13 factors in the experimental 
job evaluation system resulted in a satisfac- 
tory degree of reliability. 
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3. Five of the 13 factors accounted for a 
very large proportion of the variance in cri- 
terion scale values. For the experimental 
sample, the shrunken multiple correlation of 
these five factors with the criterion was .968. 
For the hold-out sample, the “predicted” cri- 
terion scale values (predicted by the use of 
a regression equation) gave a correlation of 
.937 with the actual criterion scale values. 
The third, fourth, and fifth factors identified 
by the Wherry-Doolittle test selection method 
(Factors no. 9, 2, and 7) added only slight 
increments to the shrunken multiple correla- 
tion; because of this, it is very probable that 
the first two identified factors (Factors no. 1 
and 3) would themselves adequately predict 
the criterion scale values, although the pre- 
dictive value of these two by themselves was 
not determined in the study. 

4. The results of the investigation were of 
such a nature as to suggest that a job evalua- 
tion system structured on the basis of these 
results could be expected to be of general ap- 
plicability to the entire population of enlisted 
naval jobs. 


Received August 10, 1953. 
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Mail Administration With and Without Incentive’ 
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Industrial psychologists have two obliga- 
tions. First, the techniques we use must 
yield valid results. Second, we must be sure 
that the use of these techniques is economi- 
cally feasible. 

The attitude scale exemplifies an effective 
device whose utilization has been restricted 
by cost considerations. Where group ad- 
ministration is possible attitude scales are 
being used extensively. But many “groups” 
in the social sense are not grouped geographi- 
cally. In this latter case the attitude scale 
is generally administered by personal inter- 
view. And that’s where the cost per subject 
zooms upward. 

There’s no doubt but that mail adminis- 
tration would be cheaper than interviewing. 
But we have been a little hesitant about ad- 
ministering attitude scales by mail. We have 
questioned the validity of such an uncon- 
trolled technique. 

This study was planned to test validity. 
We wanted to determine the comparability of 
three methods of attitude measurements,— 
one by a personal interview, the other two 
by mail. 


Method 


A total of 127 subjects were personally in- 
terviewed. A 19-item Likert-type attitude 
scale formed part of the questionnaire. The 
sample was fixed address. Interviewers were 
permitted to go next door when the second 
call back was unsuccessful. No records were 
kept of respondent refusals or the number of 
next door calls. 

A slightly abridged form of the question- 
naire used by the interviewers was mailed to 
148 subjects. (The attitude scale was not 
abridged.) Half received a 25¢ piece as in- 


1 This study was part of a total communications 
analysis of a public utility. The subjects were all 
customers of the public utility. The scale concerned 
customer attitudes toward service, public ownership 
and company personnel. 


238 


centive; this group was sent one blanket fol- 
low-up letter. The other 74 subjects were 
just asked for cooperation; here there were 
two follow-ups to non-respondents. 


Results 


The questionnaire without the quarter re- 
ceived a 58% return. The quarter brought 
back 86% of the questionnaires. 

The same attitude scale had been person- 
ally administered to other groups. On the 
basis of 175 schedules (including the 127 dis- 
cussed above), the eight most discriminating 
items were selected. Table 1 shows the av- 
erage scale values which the three groups 
made on these eight items. 

The two mail techniques produced almost 
identical average values. Attitudes as de- 
noted by the personal interviews, however, 
were considerably lower. 

As a further test of comparability, two sets 
of correlations were computed. The first set 
dealt with the incidence of the median or 
“Undecided” response. The per cent fre- 
quency of this response was computed on 
each of the 19 items, for all three groups of 
subjects. These percentages were correlated. 
The results comprise Table 2. 

All relationships were strong. Especially 
comparable are the two mail surveys. Ap- 
parently incidence of “Undecided” does not 
vary item-by-item among the methods of ad- 
ministration. 


Table 1 
Average Scale Values Attained on the Fight 
Most Discriminating Items 
Average 


Administration 


Personal Interview (N 
Mail, with Quarter (N 
Mail, without Quarter (N = 43) 


64) 
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Table 2 
Correlations of the Incidence of the “Undecided” 
Response in Three Administrations 


Pearsonian 


Comparison Correlation 


Personal Interview with Mail-Quarter 84 
Personal Interview with Mail-Non-Quarter 83 
Mail-Quarter with Mail-Non-Quarter 95 


The second set of correlations was also de- 
signed to test comparability. For each item, 
the three (out of five) least favorable re- 
sponses were grouped as non-favorable. The 
percentages of these non-favorable responses 
were calculated for each item, for all three 
groups. The correlations of these percent- 
ages are shown in Table 3. 

Once again the correlations were high. The 
two mail techniques were again most similar. 
Note that the “Undecided” response was one 
of the three non-favorable. This means that 
the correlations were to some extent a corol- 
lary of the relationships shown in Table 2. 


Table 3 


Correlation of the Incidence of Non-Favorable 
Responses in the Three Administrations 


Pearsonian 


Comparison Correlation 


Personal Interview with Mail-Quarter 84 
Personal Interview with Mail-Non-Quarter 81 
Mail-Quarter with Mail-Non-Quarter 92 


Conclusions 


The healthy return indicates a_ possible 
economy in mail administration of attitude 
scales. The per cent of mailed questionnaires 
returned was influenced by a financial incen- 
tive; results were not. Mailed administra- 
tions denoted higher attitudes than the per- 
sonal interview. Though this difference in 
scale values was pronounced, item-by-item 
ups and downs were the same with all three 
types of administration. 

Of course, all of these conclusions must 
be interpreted carefully. Individual attitude 
scale administrations are specific unto them- 
selves. But if the findings from other studies 
are similar, we may be able to consider the 
mailed attitude scale a good tool. The higher 
scale values are puzzling. But the item-by- 
item similarity is encouraging. We may find 
that the technique can be used, providing the 
denoted attitudes are depressed to some ex- 
tent. What that depression should be we 
cannot, of course, say at the present time. 


Summary 


Residential customers of a public utility 


were administered an attitude scale. Three 
methods of administration were used: per- 
sonal interview; mail with financial incentive; 
and mail without financial incentive. The 
responses obtained by each method were com- 
pared. The three were found to be reason- 
ably comparable. 


Received September 3, 1953. 
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Of the many reports on Psychological War- 
fare (PW) relatively few have been directed 
toward analyzing hypotheses about the funda- 
mental nature of PW and the ways in which 
it acts upon the individual recipient. There 
are, of course, many reasons for this situa- 
tion. Unfortunately, this is one of the many 
cases in which we have had to employ a sys- 
tem without full information of the system 
and how it works. The authors were engaged 
by the Operations Research Office to design 
and carry out a research evaluation of several 
aspects of PW in Korea, especially to deter- 
mine certain of the antecedent and attendant 
psychological factors that influence the effec- 
tiveness of tactical Psychological Warfare. 

As a general working hypothesis, it was as- 
sumed that the fundamental effects of PW can 
be characterized in psychological form, and 
that they are predictable in terms of the atti- 
tudes, motives, and experiences of the re- 
cipients. Also, it was hypothesized that PW 
can affect an individual only in certain opti- 
mal conditions. There are, no doubt, wide 
individual differences in the state of prepared- 
ness for the effects of propaganda. Theo- 
retically the individual soldier or civilian who 
is content with his role, is well taken. care of 
physically, is in no state of fear, and is in 

! This report was extracted from a more detailed 
and complete report of the total research project. 
The material has been approved for presentation at 
the 1953 meetings of E.P.A. and for publication, the 
approval being granted by the Operations Research 
Office and the Department of the Army. The views 
herein expressed are those of the authors and do not 
necessarily reflect the opinions of the Army or the 
Operations Research Office. Official clearance of the 


material in this report has precluded presentation of 
several features of the investigation. 
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complete accord with the war aims and ide- 
ology of the forces of his nation, will not be 
sensitized to the content of PW. On the 
other hand, the person who is at the opposite 
poles of these characteristics may be so ready 
to surrender or to show defection behavior— 
or whatever our PW is designed to produce— 
that he does not need the propaganda. PW 
was thus thought of as having mainly a nudg- 
ing or precipitating effect on behavior some- 
what secondary to the preparatory effects of 
the more physical and material aspects of 
warfare. 


Criteria and Factors Studied 


Because the research was to be carried out 
in Korea and by interrogation of Chinese and 
North Korean Prisoners of War, the criteria 
used were restricted by these available con- 
ditions. As criteria the research employed 
the degree of willingness to surrender peace- 
fully as contrasted with having required force- 
ful capture and the degree of disaffection 
shown. 

Attempts were made to identify several 
factors which would serve as estimates of an 
individual’s position along the general con- 
tinuum of receptiveness to PW, and to in- 
clude estimates of behavior that were hy- 
pothesized as important conditioners of the 
criterion behavior. The following nine fac- 
tors were chosen for the investigation: 


A. Degree to which the individual, before 
the war, was in accord with the ideology and 
war aims of the Peoples Government. 

B. Degree to which, and frequency with 
which, the individual had experienced intensive 
fear during battle. 
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C. Degree to which the individual felt he 
had been poorly treated and physically cared 
for by his own forces during the war. 

D. The amount and intensity of direct bat- 
tle experience the individual had. 

E. The total amount of U. N. Forces propa- 
ganda of any kind received by the individual 
during the war. 

F. The total amount of U. N. Forces propa- 
ganda per medium received by the individual 
during the war: (1) leaflets; (2) loudspeaker; 
and (3) radio. 

G. Relative proximity of the propaganda re- 
ceived to action in front line battle. 

H. Degree of defection or change in the in- 
dividual’s accord with the aims and operations 
of his military forces. (criterion) 

I. Degree to which the individual was will- 
ing to, or sought to, surrender peacefully as 
opposed to forceful capture at the time he was 
taken prisoner. (criterion) 


Scales were designed to measure the rela- 
tive position of a person on each of the nine 
factors. A combination of techniques was 


used to estimate such positions. Several ques- 
tions with sets of alternative responses were 
written for each factor, and the alternative 
responses were designed in such a way as to 
reflect a level of intensity or amount of the 


experience or attitude being assessed. Ex- 


amples of such items are given below: 


A-16. How would you characterize yourself in 
terms of actions to uphold the principles 
of the Peoples Government? 


—— (1) Tried to be critical and show 
others the faults of commu- 
nism. 

(2) Was neutral and took no ac- 
tion one way or another. 

(3) Was active in furthering the 
principles of the Peoples Gov- 
ernment. 

(4) Believed so firmly was willing 
to fight for these principles. 


. As the war progressed, to what extent 
did your living conditions such as food, 
clothing, comfort, and medical care 
change? 


—— (1) Was always able to get along 
fairly well. 

(2) Things were bad but never 
unbearable. 

(3) At times things were nearly 
unbearable. 

(4) Conditions became completely 
unbearable. 
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The questions on each factor were grouped 
together. For the end of each of these sec- 
tions a 7-point rating scale was designed with 
descriptive anchors for each point regarding 
relative position on the scale for the particu- 
lar factor involved. In addition to these more 
formalized approaches an open-end interview 
system was devised for each of the nine fac- 
tors. 


Procedure 


The question forms were translated into 
Korean and into Chinese and printed in those 
languages. Through the cooperation of the 
Army officials in Japan and Korea the au- 
thors went to Korea, where a group of native 
Korean college graduates was selected and 
trained to serve as interviewers, interpreters, 
and translators. These men were trained in 
the procedures of the standardized interview. 
Military arrangements were made to allow 
the authors and the native members of the re- 
search team into the Prisoner-of-War camps 
in Pusan, and to furnish groups of POWs se- 
lected according to several criteria.” 

In the interview sessions rapport was es- 
tablished with relative ease, and certain con- 
ditions were arranged to assess the veracity of 
the reports. The interviewer in each case, 
after explaining the general procedure, read 
each question and the alternate responses, 
and checked the response selected by the 
prisoner. Any seemingly important discussion 
that was raised about an item was also re- 
corded by the interviewer. At the end of 
each section of the interview, the rating scale 
was described in detail and the prisoner indi- 
cated his judged position on it. After each 
rating scale was used, the interviewer dis- 
cussed the prisoner’s experiences in a general 
way to probe for further comments and de- 
scriptions relating to that particular factor. 
Full notes were recorded on these open-end 
parts of the interview, and the interviewer 
made his own rating of the prisoner based on 
his comments and discussion. This process 


2 The exact nature of the criteria of selection can- 
not be specified here, nor can the authors describe 
the Prisoners other than by indicating they were 
made up of several hundred Chinese and North 
Koreans captured or surrendering during military 
operations 
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was continued through the schedule of nine 
factors. The papers were then translated 
back into English and brought back to the 
United States for analysis. 

A single numerical index was desired for 
each prisoner on each of the nine factors. A 
group of research assistants worked on the 
forms to obtain a single rating on a new nine- 
point scale for each factor for each of the 
cases. These new ratings were based on 
judgments considering all the information re- 
corded. Each form was analyzed independ- 
ently by at least two assistants. Whenever 
differences in judgment of final rating value 
exceeded one scale point, the raters held dis- 
cussions to resolve the discrepancy. Com- 
promises of one scale position discrepancies 
were accepted as being sufficiently refined 
for the data available, and the sets of ratings 
were averaged. 


Results 


The resulting data were processed by IBM 
equipment and the ratings on the nine fac- 
tors were intercorrelated. The resulting cor- 
relation matrix is presented in Table 1. A 


correlation above .10 here is significant at the 


% level. One of the general findings of 
some importance here is the fact that such 
high correlations were obtained. With data 
that were suspected to contain relatively 
large errors of measurement such as these, 
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the finding of any correlation above .30 was 
satisfying. 

The coefficients in rows H and I indicate 
the factors that relate to defection behavior 
and willingness to surrender respectively. 
Those factors that correlate with one of these 
two criteria also correlate in the same direc- 
tion and general magnitude with the other, 
and the two criterion scales correlate highly 
with one another. This expected consistency 
of results for the two criterion scales serves 
to indicate a core of reliability and credibility 
of the results for these two scales. 

Three of the factors for which scales had 
been constructed and which had been esti- 
mated to have some influence on defection 
and/or willingness to surrender did not show 
any significant relation to the two criteria. 
These particular scales were constructed for 
estimates of fear (B), amount of PW re- 
ceived by radio (F-3), and the relative prox- 
imity of the PW received to operations in 
front line battle (G). The items on in- 
tensity of fear probably did not work for 
Orientals; they did not correlate well with 
any other measures. The prisoners reported 
that they did not have radios available, and 
so scale F-3 could not be expected to give re- 
sults. 

The morale factors contained in scales A 
and C correlate higher with the criteria, H 
and I, than do the Psychological Warfare fac- 
tors E and F-1. This result was expected, 


Table 1 
Obtained Correlations Among Specified Attitudes and Experiences of North Korean and 
Chinese Prisoners of War * 


War aims 

Fear 

Bad treatment } 59 15 
Battle exp 16 07 
PW rec'd > 18 02 
Leaflets . AS 06 
Loudsp’ker r 14 — .09 
Radio , — 10 
PW proximity } 04 A3 
Defection 58 03 
Surrender 59 — 05 


40 
03 89 


14 . 


— 07 19 09 


39 mE .28 


— 12 31 30 
— 25 20 ee | 08 


* Correlations higher than .100 are significant at the 1% level on a two-tail test. 
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and it is probably the case in any large-scale 
military operation. The concern here is the 
extent to which accuracy of prediction of de- 
fection and surrender from amount of PW re- 
ceived is so contaminated with the other fac- 
tors, such as morale, that one must say that 
PW bears no significant or demonstrable in- 
fluence. Attempts were made to analyze this 
problem by means of partial and multiple 
correlations. 

The net correlation between PW (factor E) 
and defection (H), partialling out accord with 
war aims (A), changes the correlation of .31 
to .26 which is still significant. When esti- 
mates of bad treatment (factor C) are also 
partialled out, this second order partial cor- 
relation of PW and defection (E and H) re- 
duces only to .22, which is still statistically 
significant. It would seem that Psychologi- 
cal Warfare does offer some effective influ- 
ence on the Oriental troops, independently of 
its conjoint action with lowered morale. This 
net relationship does not, however, appear 
to hold for predicting relative willingness to 
surrender peacefully. When the morale fac- 


tors are partialled out, the predictable effects 


of PW on surrender behavior reduces to a 
second-order partial correlation of only .09. 

Within the correlation table obtained there 
are several values that stand out as provoca- 
tive to consider. Only certain of the rela- 
tionships are summarized here. Because esti- 
mates of chronologically antecedent behavior 
are being dealt with here, it is more than 
usually compelling to attribute causality to 
the results. Analysis of the interrelationships 
shown in Table 1 appears to indicate that de- 
fection and surrender are behavior patterns 
that are less expected in the more seasoned 
troops of high morale, but are predictable 
among green troops of lower morale. This, 
of course, is practically an established prin- 
ciple based on rationalization and experience 
in warfare. However, the fact as brought 
out here serves to demonstrate some reli- 
ability of the data obtained and to corrobo- 
rate the view that morale is a primary target 
for Psychological Warfare. At least with 
Orientals, merely reiterating the desirability 
of surrender and giving suggestions about de- 
fection to enemy troops is not enough. It is 
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also important to note in Table 1 that the 
morale factors A and C correlate significantly 
with the total amount of PW received (E). 
This result may mean either that the PW was 
destructive of morale or that lowered morale 
sensitized the troops to the PW. 

It was desired to determine whether there 
are any influences of PW that are independ- 
ent of the morale determiners of defection 
and surrender behavior. Through second- 
order partial correlations one set of results 
was obtained, as previously described. Fur- 
ther analysis in terms of multiple regression 
was used to throw light on this problem. 

The multiple correlation of factors A, C, 
D and E with the criterion H, defection, is 
.66, and the regression equation in Beta-form 
for this criterion is presented below. The 
Beta-coefficients serve to indicate relative 
contribution of the factors mentioned: 


H’ = — .37A’ + .23C’ — .14D’ + .24E’ 
I’ = — .42A’ + .1SC’ — .25D’ + .19E’ 


With the criterion I, willingness to sur- 
render, the multiple correlation with A, C, D 
and E is .65, and this regression equation in 
Beta-form is shown above. Comparison of 
these Beta-coefficients again indicates some 
of the differential influence of these particu- 
lar “determining” factors. 

When the measure of defection is added to 
the regression equation predicting surrender 
and also adding the factor of number of leaf- 
lets received, the multiple correlation was .76. 
This correlation is for the prediction of sur- 
render behavior from a knowledge of all the 
other important factors. With variables of 
the type used and obtained under the rela- 
tively poor field conditions of measurement 
that necessarily existed in this study, a multi- 
ple correlation of .76 is extremely high. Of 
course, it contains the contribution of one cri- 
terion in predicting the other criterion, in the 
amount of .71. However, the defection atti- 
tudes were presumably antecedent to the sur- 
render or capture of the troops becoming pris- 
oner. In so far as each of the variables other 
than the scale for surrender is a measure of 
some behavior occurring prior to the final 
actual surrender or forceful capture of the 
prisoners, this correlation of .76 would appear 
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to indicate that peaceful surrender may be to 
a great extent dependent on and predictable 
from the particular forms of attitudes and ex- 
periences measured by the scales devised for 
this study. It is understood that this study 


requires cross-validation and also replication 
with other groups. 


Summary | 


Standardized interviews on North Korean 
and Chinese prisoners of war were carried out 
to test the relative importance of several atti- 
tudes and experiences in determining the de- 
fection attitudes on the part of the captive 
troops and their willingness to surrender 
peacefully at the time they were taken pris- 
oner. Among the experiences assessed was 
the amount of tactical psychological warfare 
the troops had received from the United 
Nations before becoming prisoners of war. 
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“Scores” on each factor and experience were 
derived as well as on the two criteria of de- 
fection and willingness to surrender. 

The primary results are presented in a cor- 
relation matrix, which is analyzed for certain 
relations and with respect to the general hy- 
pothesis that psychological warfare is effec- 
tive in changing behavior, but its effects are 
mainly of a precipitating nature that is dif- 
ferential for persons more sensitized to it by 
their morale and experiences. 

The primary correlations, certain partial 
correlations, multiple correlations, and stand- 
ard multiple regression coefficients were ana- 
lyzed and appeared to corroborate the major 
hypothesis. Additional relationships of pos- 
sible military and social importance are de- 
ducible from the data obtained. 


Received September 10, 1953. 
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Preclinical and Clinical Criteria ' 
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In a previous article in J. appl. Psychol. 
(3), data were reported on the predictive effi- 
ciency of a trial selection test battery ad- 
ministered to an entering class of medical 
students. The criterion against which the 
tests were validated was the general grade av- 
erage at the end of the first year of medical 
school. Criterion data for later medical school 
performance have become available and pro- 
vide the opportunity for a follow-up of this 
earlier article. 

Validation studies of medical aptitude test 
batteries often employ first-year medical 
school grades as the criterion variable. The 


use of these grades as an intermediate cri- 
terion of medical success is defensible since 
the successful completion of the first year is 
necessary for continuance in medical school 
and the drop-out rate may be higher during 
this year than other medical school years. 


However, the usual question concerning the 
relationship between an intermediate criterion 
and an ultimate one still remains. A medi- 
cal school curriculum can usually be divided 
into the first two preclinical years and the last 
two more clinical years. The performance of 
students in the latter two years, presumably, 
is generally more similar to their performance 
as physicians than their performance in the 
earlier years. It has been pointed out by 
Stalnaker (5) of the American Association of 
Medical Colleges that ‘. . . the grades given 
in professional schools may have a special 
meaning. In the two preclinical years where 
basic science courses are usually taken, medi- 
cal schools have one teacher for each 4 to 5 
students. In the two clinical years, one 
teacher is used for each 1 to 2 students. 
Some schools have more full time teachers 
(or their equivalent) in the clinical years 
than they have students. Many—most—of 

1 Based upon a paper presented at the meetings of 


the Midwestern Psychological Association in Chicago, 
May, 1953. 


these clinical teachers are part time and 
many are voluntary, ie., unpaid. Grades 
given under these conditions may have spe- 
cial meaning.”” The purpose of the present 
study is twofold: First, to investigate the 
relationship between preclinical and clinical 
grades; and secondly, to compare the validi- 
ties of an aptitude test battery when the cri- 
terion variable consists of preclinical grades 
on the one hand and clinical grades on the 
other. 

This paper reports data obtained from a 
class of 129 medical students at the Indiana 
University School of Medicine. At the be- 
ginning of the first year of medical school, 
150 students were enrolled in this class; at 
the end of the third year 129 students re- 
mained. The two criteria employed were the 
general grade averages at the ends of the first 
and third years of medical school. These 
general averages are weighted averages of the 
grades obtained by a student in specific 
courses. Weights are assigned according to 
the amount of time a course meets. The spe- 
cific courses in the first and third years are 
listed in Table 1. 


The Relationship between Criteria 


The correlation between the two grade av- 
erages is .54. The mean of the first-year av- 
erages is 87.4; the standard deviation is 4.1. 
For the third-year averages the mean is 88.7 
and the standard deviation is 2.0. 

The correlations of the specific course 
grades in the first year with the specific 
course grades in the third year are presented 
in Table 1. In Table 1 the third-year courses 
are arranged in order of the size of their mean 
grades. This makes apparent a noticeable 
trend in these correlations. As the mean 
grade decreases, more and more of the cor- 
relations between the first- and third-yéar 
courses become statistically significant. Of 
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Table 1 


Correlations Between First-Year and Third-Year Grades with Means and 
Standard Deviations of These Grades 


Third-Year Courses 


Physiol. 
5 
.22 
9 


Psychoneurosis 
G. U. Surgery 
Ophthalmology 
Dermatology 
Industrial Med. 
Epidemiology 
Clin. Path. Lab. 
Pathology 

Clin. Neurol. 
Obstetrics 
Anesthesia 
Clin. Psych. 
Clin. Diagnosis 
Cardiology 
Pediatric Lect. 
Surgical Path. 
Medicine Recitation 
Anatomy 


14 


M 


the six third-year courses with the highest 
means, none correlates significantly with any 
of the first-year courses. Of the six third- 
year courses with the lowest mean grades, 
four of the six correlate significantly with all 
four of the first-year grades; one of the 
courses correlates significantly with three of 
the first-year courses; and one course cor- 
relates significantly with two of the first-year 
courses. No systematic difference in stand- 
ard deviations, shape of distributions or con- 
tent of the courses accounts for these results. 
The means of the first-year course grades are 
of the same size as the means of the low 
third-year courses. If this is not an artifact 
of this sample it may be that when high 
grades are given for the third-year courses, 
grading takes place on a different basis than 
when the mean of the grades is lower. 


Comparison of Validities for the 
Two Criteria 
A trial aptitude battery was administered 
to the medical students at the beginning of 
the first year of medical school. The tests in 
the battery were the following: 


Neuro- 
Anat. 


First-Year Courses 


Gross 
Histol. 
10 
18 
16 
04 
02 


03 
20 
Al 
02 
18 
06 


1. The Differential Aptitude Tests—Space 
Relations (2). This test consists of items 
which require two-dimensional figures to be 
translated into their corresponding three-di- 
mensional objects. The rationale behind this 
test was that an important requirement of the 
medical student appears to be the ability to 
translate his two-dimensional text-book illus- 
trations into three-dimensional life objects. 

2. The United States Armed Forces Insti- 
tute Tests of General Educational Develop- 
ment, College Level, Test Three: Interpreta- 
tion of Reading Materials in the Natural Sci- 
ences (6). 

3. The Miller Analogies Test (4). 

4. The Army General Classification Test, the 
AGCT (1). 

The validities for each test for the first- and 
third-year general grade averages are given in 
Table 2. The changes in the coefficients from 
the first to the third year are not significant. 

Table 3 presents the first and third year 
validities for the medical Professional Apti- 
tude Test (7) which was administered to the 
students in this class. The changes in the 
coefficients from the first to the third year 
are not statistically significant. 

For the trial test battery the multiple cor- 
relation of the battery with the first-year cri- 
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Table 2 


Intercorrelations, Validity Coefficients, Means and Standard Deviations for the Tests in the Trial Battery 


Intercorrelations 

Test 1 2 3 4 
. Reading Interpretation a wa wa 
. Miller Analogies 53 40 
. AGCT ae 


. Space Relations 


terion is .47, with the third-year criterion .39. 
The multiple correlation of the best predic- 
tors in the Professional Aptitude Test with 
the first-year criterion is .43, with the third- 
year criterion .39. (It should be pointed out 
that some prior selection had taken place on 
the Professional Aptitude Test upon entrance 
into medical school. The primary interest in 
this paper, however, is the change in validity 
and not the absolute value of the coeffi- 
cients.) Indication of the overlap between 
groups when selection is based on each of the 
two criteria can be obtained by using the re- 
gression equation to predict the preclinical 
and clinical grades for each student and then 
correlating the two sets of predicted grades. 
When this is done for the trial battery the 
correlation coefficient is .40. With this de- 
gree of relationship it is possible that, for this 
kind of test battery, selection based upon pre- 
clinical or clinical criteria can make substan- 
tial difference in the groups selected. 

The results comparing preclinical and 
clinical criterion grades show no significant 
changes in test validity. The tests predict 


Table 3 


Correlations between PAT Scores and General Averages 
and the Means and Standard Deviations 
of the PAT Scores 


r 


ist 3rd 
Test Score year year 
Verbal Ability 
Scientific 28 = 27 
Social 15.18 
Humanistic a ae 
Composite a - 
Quantitative Ability .28 8.29 
Index of General Ability a, 231 
Modern Society > ow 
Premedical Science 


S.D. 


82.2 
75.3 
86.8 
74.1 
80.5 
71.1 
70.5 
67.5 





Validities 


Ist year 3rd year M 


$.D 


4 39 
.28 22 
04 13 
16 13 


67.3 8.2 
60.2 12.1 
117.2 9.2 
62.1 15.4 


preclinical and clinical achievement equally 
as well. For the trial test battery, compari- 
son of the predicted scores based upon the 
two different criteria indicates that the groups 
selected on the basis of each criterion might 
be quite different. It can be assumed that 
achievement in the clinical years is a better 
indication of performance as a physician than 
achievement in the preclinical years. If this 
is the case, then a selection test battery 
should consist of predictor variables which 
concentrate upon predicting achievement in 
the clinical years. Along these lines future 
test development might well be devoted to 
the following: (a) the isolation of behaviors 
which are unique to clinical achievement as 
compared with preclinical achievement; (b) 
the development of reliable measures of these 
criterion behaviors; and (c) the development 
of testing techniques to predict these be- 
haviors. 


Received September 19, 1953. 
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The present study presents further data on 
a line of investigation opened up by Munroe 
(1). In the original study the hypothesis 
was tested that something of the dynamics of 
personality might be revealed in the differ- 
ence between the “Q” and “L”’ scores (Q-L) 
derived from the American Council on Edu- 
cation Psychological Examination (ACE). 
The present study is reported here because 
the findings, derived through pattern analysis 
as suggested by Munroe, provide implications 
for personality study and test interpretation. 


Background 


The rationale behind the hypothesis pre- 
sented by Munroe was based upon studies of 
scatter analysis, particularly those of Ra- 
paport (2) on the Wechsler-Bellevue scale. 
Her procedure was to administer ACE and 
the Rorschach tests to 80 students. The 
Rorschach responses were then related to the 
difference between the individuals’ “Q” and 
“L” percentile standings. On the basis of 
whether the “Q” or “L” percentile score was 
the higher, subjects were classified into two 
groups, one called the higher O group and the 
other called the higher L group. The Ror- 
schach entries for each group were then ana- 
lyzed for differences. 

There were found to be: (a) significantly 
more V entries (lack of accurate form) for 
the higher L group; (b) significantly more F 
entries (responses in which form was the de- 
terminant) for the higher Q group; and sig- 
nificantly more M (movement) entries for 
the higher L group. Description of the two 
groups based on differences in Rorschach 
indices might be as follows: The higher QO 
group gave responses which show objectivity 
through elaboration by careful observations 

1 This study was conducted while the author was 


employed by the Human Resources Research Insti- 
tute, Maxwell Air Force Base, Alabama. 


of objective details, formal intellective ap- 
proach, repressive efforts at control of affect 
and inhibition of normal creative imagination 
and of normal structuring of perception. 

The higher L group gave responses which 
show subjectivity and imagination through 
cues serving as springboards to new ideas, 
lack of objectivity, creative organization and 
a subjective approach, sometimes to excess. 
It should be noted that Munroe did not 
verify the findings for either group by cross 
validation. 

Roe (3) indicates that the syndrome rep- 
resented by the Aigher Q group is similar to 
that found in paleontologists as reflected in 
their Rorschach protocols. The presentation 
of this fact was further supported by citing 
evidence from an unpublished study by Mun- 
roe wherein it was found that in a sample of 
college students the higher QO group tend to 
choose more scientific and art subjects while 
in college than does the higher L group. 


Statement of the Problem 


The summary of findings presented above 
would seem to indicate that the Q-L constel- 
lation (or pattern) may be related to differ- 
ences in the utilization of intelligence arising 
from differences in personality (4). How- 
ever, since Munroe’s findings were based on 
groups at the extremes of a continuum (the 
top and bottom quartiles of the distribution 
of Q-L scores) and since they were a unique 
sample, it appeared desirable to test the origi- 
nal findings with evidence from other popula- 
tions. 

Specifically, it is intended in this study, to 
examine the findings that Aigh Q and high L 
scores reflect different personality syndromes 
through empirical demonstrations of the re- 
lationship of these scores to occupational and 
curricular selections made by the individual. 
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Procedure 

ACE was administered to subjects in an ad- 
vanced military school. The average age of 
the sample was 32 years of age. They aver- 
aged two years of college education, and were 
professionally well established. 

In calculating the difference between the 
“Q” and “L” scores a minor variation of the 
procedure used by Munroe was made. This 
variation was made in determining the rela- 
tive standing of the individuals through the 
use of the standard score rather than in terms 
of the percentile standing. Standard scores 
for the “Q” and “L” scores were computed 
on the basis of the distribution of the groups 
to which the ACE tests were administered. 

All calculations in the study were based on 
the scores of the entire population except in 
individual cases where complete data were 
not available. Where groups were dichoto- 
mized on the Q-L variable all cases which 
had a difference greater than zero, regardless 
of how small the difference, were placed in 
either the high QO or high L categories. 


Results 


The first step in the analysis was to deter- 
mine the relationship between the ACE “Q,” 
“L,” total and Q—L scores for the population 
studied. These relationships are shown in 
Table 1. 

The major hypothesis was that the Q—L 
pattern is related to the kinds of occupational 
and curricular choices made by individuals. 
A sub-hypothesis was formulated that pilots 
would have different Q—L patterns than non- 
pilots. The rationale was that whether an 
individual becomes a pilot is initially a mat- 
ter of choice although the non-pilot popula- 
tion may not be as homogeneous with respect 


Table 1 


Intercorrelations of ACE Scores 


ACE Score 
ACE 


Score = 


Total 


“QQ” 57 
“y ” 


Total 


Table 2 


Q-L Scores Obtained by Flying and Ground Personnel 


Aero-Rating 


Flying 
Non-Pilots 


Flying 


Pilots Ground 


Q-L Per Per Per 
Score N N Cent N Cent N Cent 
HigherQ 220 154. 57 16 «43 50 36 
Higher L226 16 0 «43 a OST 89 64 


Total 446 270 100 37-100 139 100 
x? = 16.86; N = 2; 001 


to choice. Thus, the pilot population may be 
considered to represent a group of individu- 
als who expect to utilize their intelligence and 
skills in a specified manner. Accordingly, be- 
cause the demands of the pilot’s job are 
highly technical and require a high degree of 
objectivity it was expected that more pilots 
would be in the Aigher Q group than non- 
pilots. Because the non-pilots are in adminis- 
trative positions it was hypothesized that they 
would tend to be in the higher L group. 

The number of pilots, flying officers other 
than pilots (navigators, bombardiers and ob- 
servers) and ground officers in the Aigher O 
and higher L groups are shown in Table 2. 
The chi-square for this distribution is signifi- 
cant (p < .01). The greatest contribution to 
chi-square was found in the difference be- 
tween the numbers of the non-flying indi- 
vidual in the higher Q and higher L groups, 
although each of these classifications con- 
tributed to the total chi-square. 

Since data on another population of about 
400 more personnel were available, it was de- 
cided to duplicate the first analysis as a 
check. Trends in this second population were 
as distinct as the ones shown in Table 2. The 
chi-square was lower but was significant at 
the .02 level of probability. The consistency 
in the two populations is attributable to the 
fact that the major contribution to chi-square 
is made by the differences between the num- 
ber of non-flying personnel in the higher O 
and higher L categories. F ratios for the 
ACE “Q,” “L” and total scores were not sig- 
nificant for these populations. 
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A second sub-hypothesis was made that 
there would be a difference between the num- 
ber of reserve officers and regular officers in 
the higher Q and higher L categories. The 
rationale was that the regular officers rep- 
resented a homogeneous group of individu- 
als who selected the Air Force as a career, 
whereas, the reserve officers represented a 
more heterogeneous group of individuals with 
respect to making a career in the Air Force. 
In the first sample there was a tendency for 
more regular officers to be located in the 
higher Q category than in the higher L cate- 
gory and more reserve officers to be located 
in the higher L than in the higher QO cate- 
gory. The chi-square for differences in these 
distributions was not significant (p < .20 > 
.10). The same tendency existed in the sec- 
ond population but again the difference in the 
distributions was not significant (p < .30 > 
.20) by the chi-square test. These findings 
indicate that the Q—L constellation does not 
differentiate between these two groups. 

A third sub-hypothesis was that there 
would be differences between groups of in- 
dividuals in different areas of greatest job ex- 
perience. This hypothesis, as in the cases of 


the other sub-hypotheses, was selected be- 
cause the area of job experience appeared to 
represent the kind of occupational activity in 


which the individual was most interested. 
Consequently, the inference was made that 
the demands of the job areas involving plan- 
ning (e.g., comptroller) would be filled by in- 
dividuals with higher L scores and job areas 
involving more mechanical or technical rou- 
tine requirements would be filled by individu- 
als with Aigher'Q scores. 

The means and standard deviations of the 
Q-L scores for individuals in each job area 
are shown in Table 3. The F ratio for the 
data in this table is 2.07 (p < .05 > .01). 
The data clearly indicate that the Q-L scores 
differentiate between individuals in the main- 
tenance-inspection type of function and the 
individuals in the _ intelligence-comptroller 
type of function. Those in the former func- 
tions tend to have higher Q scores and those 
in the latter functions tend to have higher L 
scores. Means for the second population were 
calculated, and the F test was again applied. 
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The F ratio for the variances between groups 
in the second population was found to be sig- 
nificant. In addition, the F ratio for vari- 
ances between the two populations was also 
found to be significant. Differences between 
the populations were found to be attributable 
to the Q-L scores for individuals in the 
maintenance and operations areas. The mean 
for the maintenance subjects in the new popu- 
lation was 49.1 + 1.6 and the mean for the 
operations subjects was 47.9 + .8. 

The original “Q” and total scores of the 
ACE failed to discriminate between subjects 
in the various job areas. The F ratio for 
each of these sets of scores was significant at 
the > .05 level of probability. The F of 2.!, 
however, was significant (p< .05) for the 
original “L” scores of subjects in these job 
areas. In decreasing order from high “L” 
scores to low “L” scores, as shown in Table 
4, were these job areas: intelligence, research, 
communications, comptroller, supply, person- 
nel, administration, operations, inspection and 
maintenance. There is some tendency for the 
ACE “L” score to discriminate between the 
job areas for this population in the same or- 
der that the ACE Q-L score does. There is, 
however, enough difference to indicate that 
something different is being measured by the 
Q-L pattern from that being measured by 
the “L”’ score. 


Table 3 


Mean Q-L Scores of Individuals in Ten 
Occupational Areas 


Standard 
Error of 
Mean 


1.2 


Occupational 


Area Mean 


53.9 
51.2 
51.0 
50.9 
50.3 
49.1 
48.7 
48.5 
45.0 
44.1 
49.4 
45.6 


Maintenance 
Inspection 
Communications 
Research 
Operations 
Personnel 
Administration 
Supply 
Intelligence 
Comptroller 
All other 


Unknown 





F = 2.1; p= <.05> 01. 
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Table 4 


ACE Total and Subscore Means of Personnel in Ten Occupational Areas 


“L” Score 
Occupational - 


Area M 


Maintenance 
Inspection 
Communications 
Research 
Operations 
Personnel 
Administration 
Supply 
Intelligence 
Comptroller 
All other 


Unknown 


2.3 
5.2 
5.7 
4.4 
1.5 
2.7 
2.1 
3.4 


$.7 


2.1 


<.05>.01 


A final hypothesis was that the Q-L pat- 
tern would discriminate between individuals 
who chose different fields of specialization in 
college. This variable was selected because 
it too represented a situation in which choice 
was a factor. Fields of specialization in col- 
lege were grouped into five classifications for 
purposes of analysis. The science group was 
represented by such college majors as psy- 
chology, geology, mathematics and zoology; 
the arts group by majors in liberal arts, his- 
tory, music and English literature; the tech- 
nical group by majors in the applied areas 
(excluding engineering) such as education, 
law, social casework and agriculture. The en- 
gineering and business administration groups 
were composed of majors in these respective 
fields of specialization. 

The percentage of subjects in the higher 
group and in the higher L group for each of 
the five types of college majors is shown in 
Table 5. The differences in proportions of 
higher Q and higher L subjects in these classes 
of college majors is significant (p < .02 and 
> .01) by the chi-square test. These find- 
ings do not support those of Munroe and Roe 
who found in their studies that students se- 
lecting science subjects tended to have higher 
QO than higher L scores. 

The F ratios between fields of specializa- 


_ 
S.E. 


3.0 


24 
4.6 


“Q” Score Total Score 


M 


40.8 
39.2 
43.5 
45.0 
41.5 
40.7 
40.1 
40.6 
41.6 
36.7 
42.9 
42.0 


S.E. 
1.5 
2.1 

3.0 
2.3 
0.8 
4.3 
1.3 
2.2 


M 


108.2 
108.0 
121.3 
125.8 
116.6 
116.3 
115.6 
117.3 
126.4 
113.9 
121.1 
128.4 


S.E. 
3.5 
6.9 
7.9 
5.0 
2.1 
3.8 
3.1 
4.5 
44 
8.2 
3.1 
8.3 


3.2 
1.4 


1.4 
>.05 


tion for the “Q,” “L,” and total ACE scores 
were significant (p < .01). These are shown 
in Table 6. 

Students who had specialized in science, 
arts, and engineering achieved higher scores 
on the ACE “L” score than did those who 
majored in technical and business adminis- 
tration courses. Students who majored in 
technical courses achieved lower scores on 
the ACE “Q” scores than did any of the 
other groups. Majors in engineering, arts 
and science achieved higher total scores than 
did those with business administration and 
technical fields of specialization. 


Table 5 


Proportions of Students in Each Field of Specialization 
with Higher Q and Higher L Scores 


Per Cent in 
Higher L 
Group 
57 
52 
46 
43 


Per Cent in 
Higher Q 
Group 
43 
48 
54 


£7 
a 


College Major 


Arts 
Sciences 
Technical 
Engineer 
Business 
70 


Administration Bw 


Total 418 


x? = 12.41; N = 4; p = <.02> 01. 
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Table 6 


ACE Total and Subscore Means for Personnel with Different College Majors 


“Q” Score 


“LL” Score 


Total Score 


College Major N M 


100 78.8 
83 80.1 
74 73.0 
94 78.3 

72.0 


S.E. M 


1.6 45.2 
1.8 46.2 
1.9 42.4 
1.7 47.8 
1.8 45.4 


M 


124.1 
125.8 
115.5 
126.0 
117.4 


S.E. 
0.9 
1.0 
1.0 
1.0 
1.0 


Science 

Arts 

Technical 

Engineer 

Business Administration 


F Ratio 
p 


3.75 


<.01 


3.71 
<.01 


Summary 


The present study was conducted to ex- 
amine the findings of Munroe that the high 
Q and high L patterns (derived from ACE 
sub-scores) reflect different personality syn- 
dromes. To demonstrate whether the origi- 
nal findings would apply in a different re- 
search situation the Q—L scores were studied 
in relationship to occupational and educa- 
tional differences. 

A sample of Air Force officers was used in 
the present study. The criteria used were as 
follows: the rating (flying or ground) of the 
officer; the officer’s assignment to the regular 
or reserve corps; the officer’s career field; 
and the officer’s college major. The findings 
were that: pilots tended to have higher QO 
scores, non-pilots had higher L scores; indi- 
viduals in maintenance and comptroller jobs 
had higher L scores; personnel with college 
majors in arts and sciences had higher L 
scores whereas individuals in the applied 
areas, engineering and business administra- 
tion had higher Q scores. No difference was 
found in the Q-L patterns of reservists and 
regular officers. 

Although these data indicate a relationship 
between the Q-—L pattern and occupational 
and educational choices there is a question as 
to whether this pattern represents a predis- 
posing factor or whether it emerges as a re- 
sult of experience in certain areas. 

The original ACE scores were not found to 
be consistently related to these situations. 
No differences were found between the ACE 


“Q,” and total scores of flying and 
ground personnel. The ACE “Q” and total 
scores did not discriminate between individu- 
als in different job areas although the “L” 
score did discriminate in about the same or- 
der as the Q-L pattern. Each of the original 
ACE scores was related to the college majors 
but not in the same manner as was the Q—L 
pattern score. 

It would appear from this evidence that 
there is a relationship between the ACE Q-L 
pattern and the utilization of intelligence by 
the individual. As further studies are per- 
formed using one or another form of pattern 
representation, it is reasonable to expect that 
more will be revealed through test score pat- 
terns than through independent scores or 
summation of these scores. Certainly the use 
of a clinical approach to the understanding of 
personality dynamics underlying pattern rep- 
resentations must be considered a useful first 
step in the development of hypotheses for 
empirical investigations. 


Received August 10, 1953. 
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Correspondence courses form an impor- 
tant communication medium for education in 
both military and civilian instructional areas. 
These courses may take the form of highly 
commercialized businesses independent of an 
educational institution, home study courses 
conducted by schools, or extension courses 
supported and administered by military and 
other governmental agencies. The number of 
students enrolled in such courses probably 
numbers 100,000 or more per year. Despite 
the importance of this particular segment of 
our educational system a review of the litera- 
ture reveals a most obvious scarcity of studies 
applying directly to effective methods for the 
presentation of these courses. Accordingly, 
when the question concerning the most effec- 
tive method of presenting correspondence 
courses occurred in a military extension 
course institute a ready answer was not avail- 
able in the literature. Consequently, it was 
decided to conduct an experiment in which 
the correspondence course students would be 
used as the experimental population. The 
present report is a summary of the findings 
from this experiment. 


The Problem 


The problem in this study was to deter- 
mine the most adequate methods of present- 
ing course materials for effective student 
achievement. Two major hypotheses were 
the specific foci of the experiment. The first 
was that three styles of presenting corre- 
spondence course materials would result in 
differential student achievement. The second 
hypothesis was that quality control, as im- 
posed by examining conditions, would affect 


* This study was conducted in the Officer Educa- 
tion Division of the Human Resources Research In- 
stitute, Maxwell Air Force Base, Alabama. Dr 
Lora McDonald, Extension Course Institute of the 
Air Force, provided useful guidance and assistance 
in the design and administration of the experiment. 


the achievement of the students and their re- 
tention of level of achievement. 


Procedure 


The study was conducted with applicants 
enrolling for a physical training course. This 
course was at the Officer Candidate School 
level of difficulty and was devoted to an un- 
derstanding of the development of physical 
education programs for combat fitness. 


The manual or text in the course was prepared 
in three different “styles.” Style A was a manual 
written in a popular and personal manner with 
several illustrations of the cartoon variety. This 
manual was commonly referred to as the Popular 
Science style for descriptive purposes. Style B 
was written in the formal expository manner 
commonly used in textbooks. Detailed illustra- 
tions were used in the text manual. Style C was 
actually a study guide divided into several “‘les- 
sons” or units. Each unit had its major objec- 
tive(s), references and questions. An Air Force 
field manual was provided with the study guide 
for reference purposes. This style was known 
as the “Chicago” style because it was generally 
fashioned after the syllabi used in the University 
of Chicago home study courses. 

The research design is described briefly below. 

1. All enrollees were administered, through the 
mails and before receiving course materials, a 
pre-test of fifty items. 

2. Each enrollee was then assigned to one of 
the following experimental groups: 


Kind of Examina- 
tion 
Open book examina- 
tion 
Closed book exami- 
nation 


Style of Material 


StyleA Style B Style C 


StyleA StyleB Style C 


Assignment to these groups was made on the 
basis of pre-test scores so that equal numbers of 
students in each quartile would appear in each 
of the experimental groups. 

3. Enrollees in the groups taking the open 
book examination received the “final” examina- 
tion at the time they received the course mate- 
rials to complete in any manner they desired. 

Enrollees in the groups taking the closed book 
examination named a proctor. When the indi- 
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vidual felt he was ready to take the examina- 
tion, he reported to the proctor who, in turn, 
administered the examination. The completed 
examination was returned by the proctor, with 
certification, to the administration center. 

Course materials were returned by both groups 
at the time the examinations were returned to 
the administration center. 

4. Thirty days after taking the final examina- 
tion, students took the same examination a sec- 
ond time. This test was known as the “reten- 
tion” examination. Enrollees in both open and 
closed book examination groups took the reten- 
tion examination without course materials. The 
“open book” group took the retention examina- 
tion without a proctor and the “closed book” 
group under the same proctorship as in the origi- 
nal administration. 

5. When the retention examination was re- 
ceived by the administrative office the course 
materials were returned to the student for his 
files. 

6. Enrollees were notified at the start of the 
study that they were to participate in a research, 
that they would be notified of each step in the re- 
search only before the time it was to occur and 
that they would be informed when the research 
was completed. 

The course was administered through regular 
administrative mailing procedures. A total of 
900 enlisted airmen enrolled in the course over 
a three month period. Eight months later a 
total of 353 individuals completed all steps re- 
quired in the research. The results reported 
here are based on this group. 


Results 


The means for each of the groups on the 
pre-test ' are shown in Table 1. Although 
enrollees were assigned in equal numbers to 
each of the experimental groups, more stu- 


Table 1 


Pre-Test Scores for Each of the Experimental Groups 


Closed Book Exam 


Open Book Exam 


MsSS:«wD. 
32.3 4.7 
31.7 0 4.1 
31.3 2.1 


Style* N M 


A 49 324 
B 46 32.9 
C 5131.9 


$.D. N 


39 
3.5 
3.7 


68 
73 
66 





F (open vs. closed book) = 2.2; p= >.05. F (be- 
tween styles) = 1.6; p = >.05. 

* See text for description. 

1 The pre-test was an examination used for previ- 
ous classes of enrollees. A new examination was 
used for the “final” examination in the experiment. 
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Table 2 


Final Examination Scores for the Experimental Groups 


Closed Book Exam 

N M _ S.D. 

49 425 5.38 
44.2 7.3 
45.2 8.7 


Open Book Exam 


46 
51 


F (between styles) = 2.8; p= >.05. t (closed vs. 
open book) = 6.5; p = <.01. 
* See text for description. 


dents in the open book examination groups 
completed the course than did those in the 
closed book examination groups. 

The variance between the “open book” and 
“closed book” subjects’ performance on the 
pre-test was not significant (p > .05) as re- 
vealed by the F of 2.22. The F of 1.59 was 
not significant (p > .05) for the variance be- 
tween pre-test scores for groups of subjects 
assigned each type of material. On the basis 
of these data, and in the absence of more 
definite information about ‘the subjects’ char- 
acteristics it was assumed that the groups 
were from the same population. 

The second step in the analysis was to de- 
termine differences in performance of each of 
the experimental groups on the final examina- 
tion.? The mean achievement scores and 
standard deviations are shown in Table 2 for 
each of the experimental groups. 

The difference between the achievement of 
subjects who were assigned course material 
written in different styles was not significant. 
Subjects taking the open-book examination, 
however, achieved significantly (p< .01) 
higher scores than those taking the closed- 
book examination (t = 6.54). 

A similar analysis was made for the reten- 
tion test results. The retention test was in 
every respect the same test that was given 
for the final examination. It was given thirty 
days after receipt of the final examination 

2The final examination was pre-tested on three 
groups: individuals who had taken the course, indi- 
viduals in physical training and individuals who had 
neither taken the course nor had physical training 
experience. Items were selected on the basis of dis- 
crimination function and on reliability. The Kuder- 


Richardson reliability coefficient of the total instru- 
ment was 88. 
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Table 3 


Retention Examination Scores for the 
Experimental Groups 


Closed Book Exam 


Open Book Exam 
S.D. 
450 7.1 
456 8&7 
45.7 83 


N M_ S.D. N M 


47 423 59 67 
45 410 12.7 70 
51 445 8.6 67 


F (between styles) = 0.9; p = >.05. 
closed book) = 3.0; p = .01. 
* See text for descriptions. 


t (open vs. 


and was administered in the absence of for- 
mal course materials. Again the style in 
which the course was written appeared to 
have no significant effect on the retention of 
the students’ achievement level. The quality 
control did, however, have an effect. The 
achievement scores of individuals who took 
the closed-book examination were significantly 
(t = 2.96) lower than those who took the 
open-book examination. The means and 
standard deviations for these groups are 
shown in Table 3. 

The average individual loss between the 
final and retention examination and the ex- 
tent to which the experimental variables were 
a factor in retention of achievement level is 
shown in Table 4. Differences in losses were 
not found to be significant for the types or 
styles of materials. The losses in achieve- 


Table 4 
Losses During the Time Between the Final Exami- 
nation and the Retention Examination for 
Each of the Experimental Groups 


Open Book Exam 


N M SD. N M SD. 


oe ——@) 46° 9 8 =i 46 
42 -05 46 69 -38 5.6 
51-07 3.6 67 -26 582 





F (between styles) = 0.8; p= >.05. t (open vs 
closed book) = 5.8; p= <.01 
* See text for description. 
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ment level of those subjects who took the 
closed-book examination, on the other hand, 
were significantly less (t = 5.67) than the 
losses in achievement of subjects who took 
the open-book examination. 


Summary 


The present study is a report of an experi- 
ment conducted with a correspondence course 
population. Two hypotheses were investi- 
gated during the course of the experiment. 
One hypothesis was that three different styles 
of presenting course materials would have 
differential effects on student achievement 
and retention of achievement level. The sec- 
ond hypothesis was that the degree of quality 
control, as imposed by the open and closed 
book examination, would have no effect on 
the achievement level and retention of the 
achievement level by students. 

The styles of presenting course materials 
(popular, expository, and study guide) were 
not found to be different in their relative ef- 
fectiveness as measured by an achievement 
examination. Nor did these methods affect 
the retention of the achievement level. On 
the other hand, the subjects who used the ex- 
amination with reference to course materials 
(open book examination) had higher final 
and retention examination scores than did 
those students who took the examination un- 
der monitorship without the use of the text 
materials (closed book examination). The 
subjects who took the closed book examina- 
tion maintained their original achievement 
level while those who took the open book 
examination made significant losses over a 
thirty-day period. 

The administrative procedures required in 
conducting the experiment with correspond- 
ence course populations were found to be too 
ponderous for practical purposes. The rec- 
ommendation is made that hypotheses be 
tested with more readily available populations 
of subjects and the results applied to corre- 
spondence course usage. 


Received August 21, 1953. 
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One of the problems of item analysis is the 
standard to be employed in selecting items 
for inclusion in a final scoring key.’ Typi- 
cally, some level of confidence is arbitrarily 
chosen and those items which discriminate at 
this level are selected. Guilford (6) as well 
as others have indicated that the 5% and 1% 
levels of confidence should be used as guides 
when selecting items. 

Little consideration has been given in the 
literature to the influence of the size of the 
item analysis sample upon the level of con- 
fidence at which any given item is likely to 
discriminate. Most writers have assumed the 
availability of large samples, and so this prob- 
lem has probably not been considered particu- 
larly important. Large samples, however, are 
often impossible to obtain, particularly in ap- 
plied research. The test constructor is often 
placed in a situation where if any item analy- 
sis is to be employed at all, it must be per- 
formed on a small sample, frequently less 
than 100 cases. 

Item validities, when computed against an 
external criterion, are typically low. Given a 
small item analysis sample, the resulting ex- 
pected item validities often cannot be reason- 
ably expected to exceed levels of confidence 
as rigorous as the conventional 1% and 5% 
levels. The establishment of such rigorous 
standards would therefore be expected to re- 
sult in the rejection of a large number of 
truly discriminating items. 

This was recently demonstrated in a study 


1 Presently with Nowland & Schladermundt, Green- 
wich, Conn. 

2 Formerly with Richardson, Bellows, Henry & Co., 
Inc. 

8 Although item difficulty and item intercorrela- 
tion also represent significant problems in this area, 
this paper shall concern itself only with the prob- 
lem of item validity. 
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by Feldman (4), although within a rather 
narrow range of levels of confidence. He 
used the 1%, 2%, and 5% levels as stand- 
ards for item inclusion with high and low cri- 
terion groups of 42 cases each. On cross 
validation, he found that the key containing 
only those items which discriminated at the 
1% level was generally less valid than the 
keys containing all items which discriminated 
at the less rigorous 2% and 5% levels. 

It would be expected that, given a fairly 
large item analysis sample, more of the items 
will show validity exceeding a rigorous level 
of confidence. The establishment of a less 
rigorous standard would therefore be more 
likely to result in a greater proportion of non- 
valid than valid variance being added to the 
scoring key. The problem becomes one of 
striking the most favorable balance between 
the number of truly valid items*rejected and 
the number of truly nonvalid items selected. 

The purpose of this study was to test the 
general point that an important consideration 
in establishing a standard for item inclusion 
is the size of the item analysis: sample avail- 
able. More specifically, the hypothesis was 
tested that for maximal test validity, the 
smaller the sample size available, the less 
rigorous should be the level of confidence se- 
lected as a standard for item inclusion. Con- 
versely, given a large sample size, maximal 
validity can be achieved by establishing a 
more rigorous standard. 


Method 


Instruments and Population Employed. The 
study was performed using the RBH Supervisory 
Judgment Test (SJT)} to predict an intelligence 
test criterion provided by the short form of the 
Armed Forces Qualification Test (AFQT). 

The SJT is a test which has been found useful 
for the prediction of supervisory success. It con- 
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sists of 33 items of four and five alternatives in 
which the examinee is presented with a series of 
supervisory problems and asked to choose the 
best and worst alternative for solving each. 

The AFQT is a 25-minute timed intelligence 
test consisting of 45 items covering the areas of 
verbal, mathematical, and spatial abilities. 

As a result of a supervisory selection study 
which had been recently completed (5), a num- 
ber of cases were available in which examinees 
had been administered both the SJT and the 
AFQT. The experimental population consisted 
of 540 first, second, and third line civilian super- 
visors at two United States Army Arsenals. 

Item Analysis. For purposes of item analysis 
the experimental population was randomly di- 
vided into three samples of 80, 150, and 300 
cases. The remaining 10 cases of the 540 were 
not included in the item analysis. They were 
later included, however, in the validation series. 
For each of these samples the following pro- 
cedure was followed: High and low criterion 
groups were designated by selecting the upper 
and lower 27% of the AFQT distribution. The 
percentage of cases in the high and low criterion 
groups who responded to each SJT alternative 
was then determined and the significance of the 
difference between the percentages in the two 
groups was computed. 

Construction of Scoring Keys. From the item 
analysis data derived from each of the three 
samples, four plus and minus unit weighted scor- 
ing keys were constructed for predicting the 
AFOT criterion. These keys were composed of 
all item alternatives discriminating at and beyond 
the 1%, 5%, 20%, and 50% levels of confidence, 
one key being constructed for each of these four 
confidence levels. A total of 12 scoring keys 
were constructed in all. The number of scored 
alternatives comprising each of these keys is 
summarized in Table 1. 

Validation of Constructed Keys. To validate 
the scoring keys which were developed on the 
basis of the item analysis, it was essential to em- 
ploy samples independent of the samples from 
which the keys had been developed. In order to 
fulfill this requirement and also to make maximal 


Table 1 


Number of Scored Alternatives in Each Key * 


Item Analysis Sample Size 
Level of - i 
Confidence 80 150 300 


1% 8 27 55 
5% 34 52 96 
20% 82 110 143 
50% 167 187 221 


* The total possible number of scored alternatives, 
including “best” and “worst” responses, was 302. 


Table 2 


Validity Coefficients for the Twelve Keys 


Item 
Analysis 
Sample 
Size 


Level of Confidence 


Group* 50% 20% 5% 
A 604 576 596 
80 B .676 611 563 


Cc 617 .636 523 
653 OOS 561 


699 730 702 
651 604 528 
677 O89 670 


676 677 639 


711 be 714 
647 OO: 651 
585 2 614 


Mean .650 7 661 
* Each group is composed of 60 cases. 


use of the available data, a procedure was fol- 
lowed similar to one recently proposed by Katzell 
(7). 

The cases employed in each of the item analy- 
sis samples were systematically reassigned so that 
the scoring keys constructed from one item analy- 
sis sample were employed to score cases selected 
from the other samples. Thus, for example, the 
cases which were employed in the item analysis 
of the 300 case sample were systematically re- 
distributed to form groups which could be used 
to score the keys which were developed from the 
80 and 150 case samples. The cases from the 
80 and 150 case samples were similarly reas- 
signed. 

Following this procedure, nine independent 
validation groups (designated A through I), each 
containing 60 cases, were formed. Groups A, 
B, and C were assigned to be scored with the 
four scoring keys developed from the 80 case 
item analysis sample; Groups D, E, and F were 
assigned to be scored with the four scoring keys 
developed from the 150 case item analysis sam- 
ple; and Groups G, H, and I were scored with 
the four keys developed from the 300 case item 
analysis sample. The product-moment correla- 
tions of each of the four scores with the AFQT 
criterion were then computed for each of these 
60 case validation groups. 


Results 


The validity coefficients computed for each 
of the keys on each of the validation groups 
are summarized in Table 2. To test the hy- 
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Table 3 


Analysis of Variance of the Validity Coefficients 


Source of Variation 


1266 
1974 


Between sample sizes 
Between groups of same size 


Total between groups 
Between levels of confidence 
Level of confidence X sample size 
Pooled groups X level of confidence 


0903 
0528 
0580 


Total within groups 


Total 


* Significant at the 5% level. 


pothesis that the differences among the va- 
lidities of the various keys could be attributed 
to errors of sampling, an analysis of variance 
of the validity coefficients was carried out. 
Since it is known that the sampling distribu- 
tion of r’s does not meet the assumption of 
normality required for analysis of variance, 
each r was transformed to its z’ equivalent, 
the distribution of which is known to be nor- 
mal (3). The analysis of variance was car- 
ried out according to the Type I design out- 
lined by Lindquist (8) and also discussed by 
Edwards (2, Chap. 15). The results of this 
analysis have been summarized in Table 3. 

The error terms employed in this analysis 
bear some discussion. Since the variance be- 
tween sample sizes for item analysis was 
based upon coefficients computed from inde- 
pendent samples, the error term which was 
employed was the variance between groups of 
the same size. The variance attributable to 
the interaction between level of confidence 
and sample size, however, was not based upon 
estimates derived from independent samples, 
the four coefficients in any row of Table 2 
being based upon the same cases. The error 
term which was used in this instance was 
therefore the pooled interaction terms for 
groups of the same sample size by level of 
confidence. When tested against this error 
term, the variance attributable to the inter- 
action between level of confidence and sam- 
ple size was significant beyond the 5% level. 
The variance attributable to this interaction 
was therefore employed as the error term in 


Sum of Squares 


df 


testing the significance of the level of con- 
fidence main effect. 

Only the variance attributable to the in- 
teraction between level of confidence and 
sample size was significant beyond the 5% 
level. Neither of the main effects were sta- 
tistically significant. This may be _ inter- 
preted to mean that, within the limits of the 
present study, there is no one optimal level 
of confidence to be employed as a standard 
for item inclusion. Rather, the optimal level 
of confidence is a function of the sample size 
employed for item analysis. 

Examination of Table 2 would indicate that 
the smaller the sample available for item 
analysis, the less rigorous should be the level 
of confidence employed. In short, the hy- 
pothesis tested was essentially substantiated. 

Discussion 

The results of this study, insofar as they 
may be generalized, indicate that there is no 
one optimal level of confidence which should 
be employed when item analyzing test data. 
Particularly pertinent is the result that such 
arbitrarily designated confidence levels as the 

% and 5% often cannot be expected to re- 
sult in maximal cross validities. In many 
cases, particularly when the size of the sam- 
ple available for item analysis is small, a 
much less stringent standard may be expected 
to result in higher validities than the more 
conventional 1% or 5% levels. 

Especially striking is the fact that, for all 
sample sizes employed in this study, the 50% 
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scoring keys consistently resulted in higher 
validities than the keys composed of items 
which discriminated at the 1% level. It 
should be noted, however, that the validities 
of the 50% key based upon the 300 case 
sample had started to shrink although the 
5% and 1% keys showed continuous incre- 
ments in validity as the item analysis sample 
sizes were increased. This would suggest that 
if larger samples had been employed in the 
item analyses the greatest validities would 
have been produced by the 5% and 1% keys. 

It would appear that any arbitrarily chosen 
level of confidence is likely to be a poor stand- 
ard for item inclusion. Levels of confidence, 
if they are to be employed at all, ought to 
consider the sample size available for the 
item analysis. The smaller the sample size, 
the less rigorous should be the level of con- 
fidence required. 

It would seem that standards for item in- 
clusion might profitably be established with- 
out any reference to levels of confidence. 
That is, instead of specifying in advance that 
only items discriminating at the, say, 1% or 

% levels be included in the test, an alter- 
nate procedure is suggested. Such a pro- 
cedure would entail the computation of item 
validity indices which are independent of the 
sample size upon which the item analysis is 
based, e.g., biserial 7, phi coefficient, etc. 
These items would then be arranged in de- 
creasing order of validity and a cutting point 
selected above which items would be selected 
for inclusion in the scoring key and below 
which they would be discarded. Since few 
principles are available as to where the opti- 
mal cutting point should be, the decision as 
to what constitutes minimally acceptable item 
validity will probably have to be an arbitrary 
one based upon the judgment of the test con- 
structor. 
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Only after the items have been selected 
should any reference be made to the level of 
confidence at which they discriminate. The 
level of confidence corresponding to the mini- 
mally acceptable standard of item validity 
can then be determined, and the number of 
items exceeding this standard can be com- 
pared with chance expectancies (1). If the 
selected number of item alternatives exceeds 
chance expectancy, it is likely that a scoring 
key composed of these items will continue to 
discriminate if applied to new samples. 


Received August 31, 1953. 
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of Tests 


Angus G. MacLean 


and Arthur T. Tait 
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In developing a new test or in evaluating an 
existing one, the procedure is to administer it 
to a sample of the population for which it is 
intended, and obtain the following statistics: 
Mean; Variance; Reliability; Item difficulties; 
and Item-test correlations. 

In addition to these statistics, since the selec- 
tion of items on the basis of item-test corre- 
lations does not insure homogeneity of test 
content (2), some index of item-value is needed. 
In a recent article (1) the present authors 
suggested: (a) computing the inter-item co- 
variance total for each item; and (b) selecting 
only those items whose covariance totals ex- 
ceed their variances. ‘The former indicates the 
contribution of each item to reliability (homo- 
geneity) while the latter points up the contri- 
bution to unreliability (heterogeneity). 

In the same article a method was described 
whereby all the information listed above, plus 
the S-indices (item-selection indices), can be 
obtained in one operation, and with less com- 
putation than is required for computing each 
of these statistics separately and by the usual 
formulas. Accuracy is also improved and du- 
plication of computations eliminated. For the 
sake of exhibiting the rationale and general 
method, the full computational procedure was 
given. However, there are many short-cuts 
and, as the specific purpose varies, some steps 
may become unnecessary. At this point it is 
proposed to describe the most economical 
methods of: (a) effecting the preliminary ar- 
rangement of the data; (b) obtaining the mean, 
variance (standard deviation) and reliability ; 
(c) obtaining the item difficulties and/or the 
mean item difficulty; (d) obtaining the item- 
test correlations; and (e) obtaining the selec- 
tion-indices. These elements are arranged 


sequentially, i.e., each earlier step is necessary 


to each subsequent one. Items are assumed 
to be scored 1 or 0, and ‘reliability’ refers to 
Kuder-Richardson formula 20. 


(a) A count is made (by hand or machine), 
of the number of cases “passing”? each item! 
denoted by fi;, and of the number of cases 
passing both of every possible pair of items, 
denoted by fi;._ If there are items there will 
n(n — 1) 

2 
are then displayed in the ‘F’-matrix, consisting 
of ann Xn table. In row 1 column 1 is placed 
fis. In row 2 column 2 is placed fo. The 
diagonal from top left to bottom right is called 
the principal diagonal and its elements, if 
divided by .V, the total number of cases, be- 
come the item “difficulties,” usually written 
pi, OF proportion passing on each item. In 
row 1 (corresponding to item 1), all other cell 
entries indicate the number of cases passing on 
both item 1 and the item corresponding to the 
column number. ‘Thus in row 1 cell 5 the entry 
will be the number of cases who gave the cor- 
rect response to both items 1 and 5. This is 
denoted by fis. Once every subject’s response 
to every item has been punched, the IBM 
electronic statistical machine can generate the 
F-matrix very quickly. If there are enough 
counters, it saves computing time to obtain the 
sum of the entries in each row at this time. 

(b) Once the F-matrix (which should, by 
the way, be symmetrical, i.e., fi; = fj) is ob- 
tained, there is little computing to do. First, 
if this is not already accomplished by machine, 
obtain the row sums, and then obtain 7’, the 
sum of these. Thus 7 is the sum of all the 
entries in the matrix. A quicker way to com- 
pute 7’, if only the mean, variance (S.D.) and 
reliability are desired, is to sum the frequencies 
on one side of the principal diagonal only, 
double this sum and add >> f;;, which must be 
obtained separately anyway. It helps to avoid 
mistakes if a line is ruled along the principal 


be w f;,’s and fis. These frequencies 


1 When neither a tabulator nor an IBM Electronic 
Statistical Machine is available, it is best not to con- 
struct an F-matrix, but to adopt a procedure which 


.« Will be described below. 
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diagonal, from the top left corner of the matrix 
to the bottom right. 

It is also necessary to obtain } f,; and > f;2; 
these are computed simultaneously on modern 
desk calculators. 

Operations of the type kx — yz or kx — y° 
are single operations on modern calculators, so 
that a computational entity has come into 
being known as L, defined as Vo"; that is, if 
L is divided by N® the variance is obtained. 
L is defined by: 


Liz = NOX? — (1 X)?. 


In the Kuder-Richardson formula 20, 


n ( ' nei) 
“w= 1 7. or ' 


where o;,”, the variance of item 1, is defined by 


Pi — pir. 


But it is unnecessary to divide both numera- 
tor and denominator in (2) by .V’, so 


n > Lis 
— i(1 a * ), 
Dh = NUfi— UA, 
Lu = NT — (Xfi). 
Lu 


™ N?? 


Dh 
N 


(1) 


fe = 


Cir = 


(3) 


(4) 


where 


and 


Then a? 


(S) 


and M,= ‘ (6) 


where of denotes the variance of the test and 
M, the mean. 


a 
In (2) the factor is present because 
n — 


1 
a2, the sum of the inter-item covariances, or 
(of — >0o;7), is not precisely the true vari- 
ance, o,.”, but 


n r 
‘on. 
n—1— 


o.. = (7) 

Therefore, since reliability is defined as the 
ratio of true to total variance, or that propor- 
tion of the variance which is not attributable 
to error, 
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The square of the standard error of measure- 
ment is defined by 


o ” » 


Ge @ Ot Ce 


(9) 


ut 


=o? — (10) 


‘on 

n—1 ; 
but the usual formula, which is equally effi- 
cient provided enough significant figures are 
used for ri, is 


6, = of(1 — 14), (11) 


which, incidentally, demonstrates that 
(12) 


° ° 
Cn = Fuse. 


It may be of 
be written 


interest to note that (5) may 


(Defi)? 
N? 


x 
a M ?. 


3 
\ (13) 


(c) If the separate item “difficulties” are re- 
quired, e.g., to arrange the items in order of 


difficulty, obtain to sufficient significant 


1 
N 
figures and lock it in as a constant multiplier, 
then run down the principal diagonal convert- 
ing each fj; in turn into pu, Le., 


(14) 


If on the other hand only the mean difficulty 
is required 
M, 


n 


" a 
P™ uN Lfu = 


(15) 


Should the variance of item difficulties be 
desired, it is clearly obtained by 


ve n> fie — (Xfi)? 


Sz 
p n2N2 ( 1 i ) 


(d) If item-test correlations (and/or selec- 
tion indices) are required, the sum of each row 
in the F-matrix should have been recorded, 
e.g., in a column to the right of the matrix. 
Then first obtain o;7, the item-test covariance, 
as follows: 


1 a] 
if = V ( DL fhi- 540) , (16) 
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where >-f,; denotes the row-total for item i. 
If difficulties have been recorded in a column, 


In (16) Sofi 


is, of course, a constant and it is simpler from 
a computational point of view to make two 
columns out of (16), first locking in }°f,; and 
obtaining the differences and then multiplying 
them by the locked-in reciprocal of V. I, 
however, the items are to be selected by use 
of the S-index rather than by selecting the best 
r.,s, do not use (16) at all, but see section (e) 
below. 


pi can be substituted for i 


The item-test correlations are obtained by: 
2 
Tit 

i= 


Noi2-o8 ay) 
where a,” and o? are already available. ‘This 
is the point-biserial correlation between item 
and total test. 

(e) The selection index for item 1, S;, is de- 
fined by 


S; = oi — 20,7. 


(18) 


[For the full explanation see reference (1). | 
If S; is negative, item i should be rejected, the 
more so, the greater its absolute value. How- 
ever, if a; has not been computed, S; is de- 
fined by 


Si = 4 (Lie — 214). (19) 
N? 

The best computational procedure is to ob- 
tain, for each item, the value (V>Ufi — fii: 
> fi) and record it; then obtain each (Nfir— fi?) 
and record it. It will be noticed that the latter 
is L,; while the former is Li. Then subtract 
2L,; from Ly, and record the difference, with 
algebraic sign. Finally, multiply each differ- 
a This last step would not be 
necessary if the purpose were to reject all items 
with a negative S, whatever the magnitude. 
However, it has been found in practice that if 
the substantially negative items are eliminated, 
so that their rows and columns are removed, 
those with small original negative S’s may now 
have only positive covariances with the re- 
maining items, and their covariance totals may 
now exceed their variances, giving them a posi- 
tive S-index. Therefore, the large negatives 
should be eliminated first and the indices re- 
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calculated. Also, as n diminishes, the increase 


U ‘ 
+ sometimes more than com- 
—1 

pensates for the increase to be gained in ry, by 
rejecting one or two small negatives. 


in the ratio 
) 


Procedure When Machine Facilities 
are Limited 


Some of the statistics used in the above 
formulas are identica] with those arrived at in 
the ordinary process of obtaining individual 
scores and their mean and standard deviation. 
T is the quantity usually denoted by > X/, 
that is, the sum of the squares of the individ- 
uals’ total-test scores, and }-f;; is S}X,. The 
only other statistics needed are > /;,? and > f,.. 
With regard to the first, it is customary, in 
developing a test, to obtain the frequency pass- 
ing each item in order to compute the item 
difficulties, so nothing unusual is called for. 
The only extra step required by this method is 
obtaining, for each item, the quantity > fi. 

Now, as long as items are scored 1 or 0, 
> f,: is the same as }> Y,,, the sum of the total- 
test scores of those who gave the correct re- 
sponse to item 7. This is a cross-product sum 
and gives rise to the formula 


bam BEX ~ fer EX. (20) 
F,, is, of course, identical with }>X;;. The 
other formulas above may similarly be re- 
written, substituting }}>¥? for 7, SOX, for 
fii, and SX for ¥ f,:. 

The most elementary way to obtain > X x 
would be to hand-sort the answer-sheets. It 
is quicker and easier (and has other advan- 
tages, as will be seen) either to punch item- 
scores and total-test scores on IBM cards and 
use a mechanical sorter, or to use “‘needle-sort”’ 
cards, which are punched around the edge by 
hand and sorted by inserting a sorting needle. 
This last method was recently tried out so as 
to determine the amount of time it would con- 
sume. ‘Total scores were written on the cards, 
and a sort was made for each item, then the 
scores of those who had gotten the item right 
were added on a desk calculator. There were 
124 items and 100 individuals (1 card for each), 
and the whole operation, from punching to the 
final summing of the }> Y,,’s as a check, took 
between 19 and 20 hours, or about 9 minutes 
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per item. ‘This is an over-estimate of the gen- 
eral average because, with that number of 
items, two cards had to be stapled together for 
each individual, and they had to be very care- 
fully fitted together so that the holes would be 
in the right positions. With 100 or fewer 
items this time-consuming step would be elimi- 
nated and the sorts would be quicker too; 
about 7 minutes per item would be a fair allow- 
ance with V = 100. With a key-punch and a 
mechanical sorter the punching and _ sorting 
would be quicker, but the real saving in time 
occurs when a tabulator is available for the 
adding —-here two or two and a half minutes 
per item is ample allowance. 

The full F-matrix method has, however, cer- 
tain advantages, the most prominent of which 
is this: if any items have to be eliminated, all 
that is required is to delete the corresponding 
rows and columns and to obtain new row sums. 
If one is using )>N,,, on the other hand, the 
answer-sheets have to be rescored and new 
total-test scores punched, and the sorting-and- 
summing operation repeated. Thus, the F- 
matrix, though more work at first, is likely to 
be less work on the whole unless the number 
of items is large. Of course, if there is no 
question of eliminating items, but only of eval- 
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uating the merits (or demerits) of the items 
present, the }>Y,, method is the more eco- 
nomical, the greater the number of items in- 
volved. The reason for this is that the time 
it requires increases linearly with the number 
of items, while the time required by the F- 
matrix increases as the square—thirty-five 
minutes for a 15-item test, about 4 hours for 
a 50-item test, and probably two days for a 
100-item test. If the matrix can be broken 
down into subtests of not more than 15 items 
each, however, the time required is again a 
linear function——roughly half an hour for each 
such subtest. If this cannot be done, it is 
better to use the >> ¥,, method, eliminate all 
items with negative S-indices, and repeat the 
scoring, sorting and summing to obtain the 
final item values. 

Received April 5, 1954. 
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The purpose of this study was to deter- 
mine what, if any, relationships exist be- 
tween a problem checklist used in the Stu- 
dent Counseling Bureau at the University of 
Minnesota and the Minnesota Multiphasic 
Personality Inventory. Berdie (1) related 
this same problem checklist to the Minnesota 
Personality Scale and found in his sample that 
students with low scores (indicative of the 
presence of problems) on various sections of 
the Scale tended to indicate related problems 
on the checklist. An added purpose of the 
present study was to compare checklist re- 
sponses with those in Berdie’s study con- 
ducted eight years earlier. 

The problem checklist ' contains 33 items 
and instructs the student to check those which 
he. has not adequately solved and to double 
check those which he wants to discuss with a 
counselor. 


Procedure 


Checklist responses and MMPI T-scores 
were obtained for 335 men and 125 women 
students counseled at the Bureau during the 
1948-1949 college year. This sample in- 
cluded all college and pre-college students for 
whom complete data were available. Stu- 
dents with MMPI's of doubtful validity were 
eliminated. Cutting scores of 70 on the L 
scale, 80 on the F scale, and 60 on the ? scale 
were used as the validity criteria (2). 


Results 


Checklist Analysis. The most frequently 
checked items (single and double checks com- 
bined) dealt with educational and vocational 
problems. Over 80 per cent of the men and 
70 per cent of the women indicated that they 
were unable to determine what they were best 
able to do; over 50 per cent of both sexes did 
not know what they wanted to do. One rea- 
son for these results may be the fact that the 


1A reproduction of the checklist may be found in 
Berdie (1). 
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Student Counseling Bureau is primarily an 
educational and vocational guidance center. 
Other frequently checked problems were con- 
cerned with job opportunities, duties, and 
training requirements and with study habits. 

In the personal-social problem area, over 
30 per cent of both sexes felt that they lacked 
self-confidence. Twenty-five per cent of the 
women felt that they did not have enough to 
talk about in social situations. 

Investigation of single and double checking 
of the more frequently expressed problems 
indicated that the subjects seemed more will- 
ing to discuss their educational-vocational 
problems with a counselor than their per- 
sonal-social problems. They may have per- 
ceived the Student Counseling Bureau mainly 
as a place to obtain help on these kinds of 
problems. Educational and vocational prob- 
lems are probably more socially acceptable 
and personally admissable than those dealing 
with personality and social relations. In at- 
tempting to explain this phenomenon, Berdie 
states that: “Reluctance to discuss certain 
types of problems may be due to the fact 
that the students think that nothing can be 
done about (them). . . . They may consider 
their personal problems too private to discuss 
with a relative stranger. . . . When students 
come to the counselor, they come with one 
primary purpose and all other matters may 
appear irrelevant at that time.” 

On the checklist, a significant difference 
(.05 level of confidence) existed between men 
and women on only one item: I have been 
unable to determine what I am best able to 
do. Approximately 82 per cent of the men 
checked it, while only 70 per cent of the 
women did so. Otherwise, the men and 
women were roughly equal on relative per- 
centages of problems checked. It is not 
known whether this is due to the structure of 
the checklist, the actual incidence of such 
problems in these groups, or other unidenti- 
fied variables. 
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Comparison of Checklist Responses. Com- 
parison of the total percentages of checks in 
Berdie’s study with the present investigation 
yielded no significant differences between the 
women in these two samples. However, seven 
significant differences were found on checklist 
problems between the male samples. Signifi- 
cantly greater percentages were found by 
Berdie on two items: * 


I usually feel inferior to my associates (.05) 
I do not know how to obtain the money I 
need (.05) 


In the present study significantly greater per- 
centages were found on the following items: 


I am unable to determine what I would like 
to do (.05) 

I am frequently embarrassed when with 
others (.05) 

I have so much outside work that I am 
neglecting my school work (.05) 


2 The numbers in parentheses following the item 
indicate the level of confidence. 
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I do not know how to take good lecture 
notes (.01) 
I am not interested in my studies (.01) 


The above differences may be a function of 
sample sizes and composition (e.g., the load- 
ing in the present study of returned service- 
men), an actual change in student problems 
over a period of time, or of other factors not 
readily apparent. 

MMPI Characteristics of Groups Checking 
Many and Few Problems. The median total 
number of problems checked, regardless of 
their nature, was four for the men and five 
for the women. The male average was 4.8 
with a standard deviation of 22.9; the female 
average was 4.9 with a standard deviation of 
12.4. Thus, the men were nearly twice as 
variable in the sheer number of problems 
they checked than were the women. On the 
basis of these statistics, the male and female 
groups were separated into two groups: the 
“High” group (checking five-or-more prob- 
lems) and the “Low” group (checking four- 
or-less problems). Critical ratios were com- 


Table 1 


Comparisons of Mean T-Scores of High and Low Problem Groups 


Men 


High 
(V = 145) 


Low 


(N = 190) 
Mean S.D CR 


50.1 O04 50.0 00 
52.0 4.1 510 29 
4.6 55.8 7.0 

8.6 524 9.2 
516 8&3 

55.5 11.9 

559 8&3 0.56 

59.5 10.2 2.04" 

58.6 11.2 1.57 

53.9 92 2.48* 

60.8 11.5 4.05** 

00.4 12.0 3.97** 

59.6 10.5 1.83 

53.0 10.0 5.46** 


Mean S.D. 


2.61* 
3.84** 
5.49** 
0.66 
3.63** 


* Significant at the .05 level of confidence. 


** Significant at the .01 level of confidence. 


Women 


Low High 
(N = 63) (N = 62) 


Mean S.D. Mean S.D. 


500 O00 
52.2 4.0 
523 49 
58.4 84 
49.0 5.0 i) 
50.2 7.8 8.1 
53.9 68 33.7 9.2 
54.4 10.7 11.3 
47.8 13.1 5 8.5 
53.0 &.5 8.6 
52.8 &.0 5 93 
54.5 7 : 10.3 
55.0 Il 57. 11.8 


50.0 r 53. 10.9 


04 
44 
6.6 
8.7 


nt 


coN f= 
i. 


ut Un 


> Significantly different from zero at the .05 level of confidence 
°° Significantly different from zero at the .01 level of confidence 
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puted between the High and Low groups on 
each MMPI scale. The results are presented 
in Table 1. 

Both the male and female High groups 
were significantly higher than the Low groups 
on the F, Pa, Pt, Sc, and IE scales. The F 
scale indicates “faking bad” or inability to 
comprehend the inventory items. The Pa 
scale indicates tendencies toward sensitivity, 
hostility, and difficulty in taking criticism. 
The Pt scale indicates tendencies toward anx- 
iety, indecisiveness, and feelings of inade- 
quacy and insecurity. The Sc scale indicates 
tendencies toward fantasy, shyness, and with- 
drawal. The IE scale indicates tendencies 
toward social introversion (2). 

The male High group was also significantly 
higher than the Low group on the D scale 
(indicating depression, discouragement, or de- 
jection of a situational or prevailing nature) 
and the Pd scale (indicating nonconformity, 
irresponsibility, impulsiveness, and asociality ). 

Both the male and female Low groups were 
significantly higher than the High groups on 
the K scale. This scale indicates test-con- 
sciousness, defensiveness, and an attitude of 
problem denial. The male Low group was 
also significantly higher than the male High 
group on the L scale, a measure of the degree 
to which the subject may be attempting to 
falsify his scores by always choosing the re- 
sponse that puts him in the most socially ac- 
ceptable light. 

Biserial r’s (see Table 1) for all of the 
above comparisons were significantly greater 
than zero, but the amount of overlap of the 
High and Low groups was too great to enable 
accurate classification into these groupings 
solely on the basis of MMPI scores alone. 
Nor would the number of problems an indi- 
vidual checked be effective in predicting his 
MMPI scores. The significant differences ob- 
tained, then, are chiefly statistical rather than 
practical in nature. Only tendencies for these 
groups may be legitimately pointed out on the 
basis of these differences. It does seem, how- 
ever, that individuals who check many prob- 
lems in this sample tend to have somewhat 
more deviant MMPI profiles than those who 
check few problems, although those who check 
few problems may be denying the existence 
of other difficulties (high K score). 
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Checklist Responses of Subjects Grouped 
According to Their Highest MMPI Scale 
Score. Another method of treating the data 
was to group the men and women separately 
according to their highest score on the MMPI 
clinical scales. An individual’s highest scale 
score would be the one indicated by the high- 
est “peak” on his MMPI profile, regardless of 
score magnitude. For both men and women, 
approximately 50 per cent of each group 
checked problems 6 and 10 on the checklist. 
These were the most frequently checked items 
for the whole sample, so they are valueless as 
far as differential prediction is concerned. 

Half of the men with highest scores on the 
D, Pt, and IE scales indicated on the check- 
list that they lacked self-confidence. In other 
words, there was a tendency in this sample 
for an admitted lack of self-confidence to 
accompany characteristics assessed by the 
MMPI as depression, anxiety, indecision, 
compulsiveness, feelings of inadequacy and 
insecurity, and withdrawal tendencies. 

Half of the men whose highest score was on 
the Pa scale checked problems related to a 
lack of job information and reading difficul- 
ties. High Pa scores are interpreted as in- 
dicative of sensitive, hostile, and paranoid 
tendencies. 

Half of the women with highest scores on 
the Sc and IE scales stated on the checklist 
that they did not have enough to talk about 
in company. Sc and IE peaks are indicative 
of shy, withdrawing, socially introvertive be- 
havior. Half of the women with Pa peaks 
also indicated that they lack job information 
as did half of the men with the same highest 
MMPI score. 

In general, there seems to be some logical 
correspondence between several of the check- 
list problems and personality characteristics 
as assessed by the MMPI. This relationship 
is more obvious for the D, Pt, IE, and Sc 
scales than it is for the Pa scale. 

Since the number of individuals in most of 
the highest scale groups was so small (median 
N for women = 5; for men = 23), valid in- 
ference from the above results is impossible. 
The data should be considered only as de- 
scriptive of the sample employed and as a 
stimulus for further research. It is conceiv- 
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able that with sufficiently large homogeneous 
MMPI scale groups, differential problem syn- 
dromes might be found on the checklist. 
Pattern analysis of both the checklist and 
the MMPI (3, 4) and their interrelations 
might also prove to be a fruitful technique. 
The value of such research would be in ob- 
taining stable correlates of personality with 
respect to expressed problems and _ stated 
needs as indicated by the problem checklist. 


Summary 


Analyses of the problem checklist and its 
relations to the MMPI showed that: 

1. The most frequently checked problems 
dealt with educational and vocational diffi- 
culties. 

2. Men students were nearly twice as vari- 
able in the number of problems they checked 
as were the women students, although the 
average number of problems checked by each 
sex was roughly the same. 

3. Over a period of time, the relative per- 
centages of responses on the checklist items 
did not appreciably change for the two sam- 
ples compared. 

4. The subjects seemed initially less re- 
luctant to discuss recognized educational-vo- 
cational problems than recognized personal- 
social problems with a counselor. 

5. Both men and women students who 
checked five-or-more problems on the check- 
list (as opposed to those who checked four- 
or-less) had statistically, though not prac- 
tically, significant higher mean scores on the 
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F, Pa, Pi, Sc, and IE scales and significantly 
lower scores on the K scale of the MMPI. 
Men students checking five-or-more problems 
also had significantly higher Pd and D scores 
and significantly lower scores on the L scale. 
Biserial r’s for all of the above comparisons 
were significantly greater than zero. 

6. Aside from the most frequently checked 
problems in the whole sample, half of the men 
students with MMPI peaks on D, Pt, and IE 
felt that they lacked self-confidence; half of 
the women students with Sc and IF peaks felt 
that they did not have enough to talk about 
in company; half of both men and women 
with Pa peaks indicated.a lack of job infor- 
mation, while these men also checked prob- 
lems dealing with reading difficulties. Ex- 
treme caution is needed in generalizing from 
these results since the criterion groups were 
too small in most instances for stability or 
validity of results derived from them. These 
data, then, should be considered merely as 
descriptive. 


Received August 21, 1953. 
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Facilitating Legislative Research 
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Legislative behavior has been of periodic 
interest to many psychologists. Two meth- 
ods of analysis have been used. A _ small 
sample of issues is selected and legislators 
compared according to their votes on these 
topics (1, 10, 15). Or a few legislators have 
been studied on many topics (5, 6,7). Such 
restricted studies concentrate on a few legis- 
lators and a small number of topics. The 
basic paradigm for these legislative studies 
does not differ radically from the familiar 
sociometric analyses of industrial and social 
psychologists (2, 8). 

A major limitation has been the difficulty 
of tabulating joint voting (9). Associated 
with this weakness are other shortcomings. 
Reliability studies of voting are almost non- 
existent (4). Data are presented in tabular 
form and thus relationships among these data 
remain vague (16, 18, 20, 22). 

This paper reports a method for rapid 
tabulation of such data. In final form the 
data are in a symmetric matrix to which a 
variety of statistics may be applied. 


Procedure 


The official legislative journals provide the 
records from which data are obtained. In- 
formation in these records describes the men, 
their districts, the issues upon which they 
vote, and the roll call votes they cast. Thus, 
in our analyses we may control for the legis- 


1 Dr. Gloria Lauer Grace assisted in the design of 
these studies. The studies have been financed by the 
University Research Board, University of Illinois, 
1950-1952, and the All-College Research Committee, 
Michigan State College, 1952-1953. Leonard P. 
Staugas, Statistical Service Unit, University of Tlli- 
nois, designed the wiring of the accounting machine, 
Types 402-403. Victor E. Buys, Supervisor of 
Tabulating Operations, Statistical Methods Section, 
Division of Disease Control, Records, and Statistics, 
Michigan State Department of Health, designed the 
wiring of the electronic statistical machine, Type 101. 
Norma E, Taschner, Tabulating Office, Michigan 
State College, and Doris L. Duxbury, Statistical 
Methods Section, Michigan State Department of 
Health, were most cooperative in permitting the use 
of their IBM facilities 
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lative body, time of meeting, topics, chair- 
man, etc. 

The data are transcribed on standard mark- 
sensing IBM cards. This process is rapid. 
The card accommodates up to 54 simple items 
of data. More than one card may be used to 
transcribe larger legislative bodies. The roll 
calls list men alphabetically, and so men are 
assigned to columns on the card in alphabeti- 
cal order. The content of the topic on which 
the vote is taken may also be coded on the 
card, as may other control information. If 
each vote is coded in chronological order, 
easy reference may be made to the journal to 
check discrepancies. All data are marked on 
the card by electrographic pencil. 

One card is used for each type of vote. If 
only split roll call votes are tabulated, this 
means a minimum of two cards (affirmative 
and negative). and a maximum of four cards 
(affirmative, negative, absent, abstain) for 
each vote. A 7 is marked in a man’s column 
on the card which represents the type of 
vote he has cast. If he does not vote one 
way, he votes another. Therefore, he will 
have a / in one and only one card for each 
vote. The other cards for that vote will be 
blank in his column. 

Punching the cards is accomplished by ma- 
chine. It is advantageous for comparative- 
historical analyses to have the data arranged 
in a definite, permanent order. A suggested 
order is numerical, according to the number 
of the district represented by the legislator, 
with the First District in column one, and so 
forth. Thus, if men should fail at the polls, 
retire, or die, the position of the district rep- 
resentative is unchanged. When the repro- 
ducing punch is wired for mark-sensing, the 
data may be rearranged from the alphabetical 
order of the men to the numerical order of 
the district. 

The cards are prepared for checking by 
sorting them on the basis of the vote num- 
bers. The accounting machine (Type 402) 
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is wired for addition, printing a minor pro- 
gram total each time the vote number 
changes. Each vote is listed with its content 
code, the identification of the legislature, and 
the total of all the cards for each vote is 
printed. See Table 1. If the mark-sensing 
and punching are correct, a series of /’s ap- 
pears in the columns representing the legis- 
lators. If a zero appears, it means that the 
legislator has been overlooked. If a 2 or 
greater appears, the man has been given 
credit for having cast more than one type of 
vote on an issue. Correction of these errors 
may be made by referral to the pencil mark- 
ings on the cards. If the cards have been in- 
correctly marked, reference must be made to 
the journal. The method is remarkably ac- 
curate. The importance of having a machine 
check rather than a hand check cannot be 
overestimated in accuracy and amount of 
time saved. Should subdecks for controls on 
time or content be reproduced from the mas- 
ter deck, it is highly advantageous that these 
also be machine checked. The investigator is 
then assured of a perfect working deck at all 
times. If errors later appear, he knows chat 


they are a function of the machine operations 


and not the cards. 

The final process is the tabulation of the 
joint-occurrence matrix.” Two methods are 
possible. Either the accounting machine 
(Type 402-403) or the electronic statistical 
machine (Type 101) may be used. The ac- 
counting machine takes about four times as 
long and is liable to greater error than the 
electronic statistical machine. The essential 


Table 1 
Facsimile of the Verification of Voting Data 
(Columns 1-20 represent legislators; a zero [0] or 


two [2] indicates an error for that man on the par- 
ticular vote. ) 


Vote 


Number Legislators 


2 The joint-occurrence matrix will be referred to 
as the jo-matrix. 


Table 2 
Facsimile of the Joint-Occurrence Matrix 

(Column 1 and row 1 identify the legislator from 
district one, etc. The diagonal is constant, showing 
the number of times each man voted. The other cells 
indicate the number of times each man has voted with 
every other man. The symmetry of the matrix indi- 
cates that the data are correctly tabulate.) 


Legislators 


Legislators 1 2 3 5 
167 95 &8 137 
95 167 110 
88 130 97 
74 120 &5 
137 110 7 167 


task is to instruct the machine to record the 
number of times every man votes (has a / 
in his column) with each other man. 

For both machines the cards must be 
sorted one column at a time. All cards which 
have a 7 in the sorted column are fed into 
the machine for tabulation of the jo-matrix. 
The matrix must be symmetrical. See Table 
2. This is the check on the tabulation. 
Cards may be summary punched with the 
same totals that appear on the printed forms. 
These summary cards may be useful for fur- 
ther matrix manipulation or for larger sum- 
maries of the data, if these are part of the 
experimental design. 

The 402 machine allows us to compare as 
many as 12 columns at a time. Each time a 
run is made, the control wire must be moved 
to the column on which the cards have been 
sorted. If m > 12, the wiring must be 
changed to pick up from the next set of 12 
columns, 13-24, etc. We then begin sorting 
with column 1 and again run the entire 
gamut. Machines normally emit an impulse 
when two readings are unequal. The prob- 
lem of wiring is to allow an impulse to be 
freed when two readings are equal, i.e., when 
there is a / in the sorted column and in any 
of the other columns being compared. This 
is accomplished by wiring from the compar- 
ing exit to the pilot selectors’ digit pickup. 
The machine is wired for addition, minor pro- 
gramming, and printing of totals. If a per- 
centage matrix is desired, the reciprocal of 
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the total number of votes is emitted into a 
counter entry. The number of significant 
digits required for accuracy must be borne in 
mind in computing this reciprocal. 

The 101 machine compares as many as 60 
columns at one time. The deck is sorted on 
one column and all cards with a 7 punch are 
run through the machine. The machine will 
print the total jo’s for 60 men with no wiring 
changes necessary. The essential wiring for 
the 101 machine provides that a / from the 
digit emitters be fed into the recode pickup, 
which has been wired so that the impulse 
passes from column to column. The recode 
selectors are also wired together. A wire runs 
from the count to to the recode selectors, and 
then from the unit counters to count return. 
At least one subtraction plug must be wired. 
The sort select switch is set at the 2 position. 
If the legislature exceeds 60 men, it is profit- 
able to make enough decks to account for all 
of the men without rewiring the 101 machine. 
If there were 90 men, deck A could list men 
from districts 1-30 and 31-60; deck B from 
districts 1-30 and 61—90 (in columns 31-60); 
and deck C districts 31-60 (in the first 30 
columns) and 61-90 (in the second set of 30 


columns). The printed matrix is then spliced 


together. This method avoids the necessity 
for wiring changes and may also be employed 
with the 402 machine. 

Since each investigator has his special prob- 
lems of design, he will interpret these meth- 
ods to suit himself. The 101 machine is the 
better one for even the smallest matrices. 
Fewer wiring problems are encountered, less 
time consumed, and the report is more readily 
checked. On the other hand, the 402 ma- 
chine is more readily available at present in- 
stallations. 


Discussion 


The application of this method to psycho- 
logical research may be made more explicit. 
This method may be applied to any dichoto- 
mous data. The vote is an excellent example. 
Sociometric choices provide a further major 
field of application. 

Voting analyses and sociometrics have been 
criticized for failing to report reliabilities. 
We often study a handful of votes or adminis- 
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ter one sociometric and hope to describe or 
predict behavior. This tabulation technique 
makes possible the study of large numbers of 
votes and sociometric runs. Time or content 
matrices may be compared with their counter- 
parts representing other time or content sam- 
ples. To the degree that the jo’s are similar, 
we may speak of the S’s as being consistent 
and/or our measures as reliable. The appli- 
cation of this method to one legislature has 
been reported in the literature (12). We 
have since applied it to eight others. We 
found that behavior is significantly more 
consistent from issue to issue than from time 
to time. The research possibilities and prac- 
tical applications are as broad as the in- 
genuity of the investigator. 

We have alluded to the fruitfulness of hav- 
ing the data in matrix form. The reason for 
this is the development of matrix algebra and 
its application in factor analysis. As these 
statistical techniques become more refined, 
matrix data will assume greater importance. 
A few other possibilities are latent structure 
analysis (factor analysis applied to joint pro- 
portion) (13, 14), the difference method (19), 
matrix squaring or cubing (8), and the appli- 
cation of information theory (11). In addi- 
tion to these reified techniques, clusters may 
be arbitrarily selected from the matrix with- 
out such refinement (3, 17, 21). 

Finally, a major value of this method for 
applied studies is the speed with which the 
data may be tabulated. A legislature’s votes 
on any day may be coded, punched, checked, 
and the jo-matrix tabulated overnight. Thus, 
a daily record may be kept of voting blocs. 
Weekly, monthly, or yearly summaries may 
be assembled. Matrices may also be tabu- 
lated according to special-interest legislation. 
In this manner, a legislator, citizenship com- 
mittee, civic interest group, or social scientist 
could have at hand a daily, topic summary of 
the policy-body’s voting patterns. A precise 
account of a group’s sociometric development 
could similarly be made. 


Summary 


A method for the quantitative treatment of 
dichotomous data is reported. This IBM 
method proceeds quickly from written rec- 





Facilitating Legislative Research 271 


ords to matrix form. Cards are mark-sensed 
with the data and punched and checked by 
machine. A matrix of joint-occurrences is 
tabulated by either of two IBM machines. 
The matrix has many practical applications. 
The method facilitates rapid, accurate analy- 
sis of political bodies, sociometrics, and other 
social data in dichotomous form. 


Received September 5, 1953. 
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In measuring the effectiveness of advertise- 
ments it is of primary importance to be able 
to measure the initial attention-drawing power 
or eye appeal of an advertisement. This fact 
is obvious, since an advertisement which is 
not seen cannot accomplish its intended pur- 
pose. One of the methods which has been 
used to accomplish this measurement of at- 
tention-drawing power is eye movement pho- 
tography. This method produces an objec- 
tive photographic record of the eye move- 
ments of a subject while he is observing one 
or several advertisements. If an experimental 
design permitted pairs of advertisements to 
be presented to the subject, the photographic 
record taken by an eye camera would indi- 
cate which of the two advertisements the sub- 
ject preferred to observe. It is possible, then, 
by totaling the preferences of several sub- 
jects to scale a set of advertisements accord- 
ing to their relative attention-drawing power. 

Although the eye movement photographic 
method produces an objective record of the 
subject’s preferences, the method has several 
disadvantages: 

1. The advertisements must be presented 
to one subject at a time and the presentation 
becomes relatively time consuming if large 
numbers of subjects are to be used. 

2. The transportation and assembly of the 
necessary equipment is cumbersome. 

3. The necessary task of frame by frame 
reading of the film record is laborious and 
time consuming. 

It was the purpose of this study to investi- 
gate the possibility that a less time consum- 
ing method of measuring the attention-draw- 
ing power of advertisements will essentially 
scale advertisements in agreement with the 
scaling produced by eye movement photo- 
graphic methods. Specifically, the investiga- 
tion dealt with the relationship between scal- 
ings produced by a group  tachistoscopic 


method and scalings produced by the Purdue 
Eye Camera (2). 


Procedure 


Ten advertisements to be scaled were se- 
lected from current issues of popular weekly 
magazines. All advertisements were in color 
and full page in size. The subjects for the 
study were 154 students in college psychology 
classes, education classes, and adult education 
classes. 


Tachistoscopic Presentation. For the tachisto- 
scopic part of the study the ten advertisements 
were reproduced on 35 millimeter colored trans- 
parencies which were individually mounted in 
14,” x 2” cardboard slide mountings. A special 
brass holder for pairs of these mounted trans- 
parencies was designed to fit into the slide car- 
rier of a standard 3144,” x 4” lantern projector. 
It was possible, then, to project together on a 
screen any two of the ten advertisement slides. 
The brass holders could be slipped in and out of 
the slide carrier in the same manner as standard 
3,” x 4” lantern slides. In order to speed up 
the presentation two of the brass holders were 
constructed so that one pair of slides could be 
readied while another pair was in position to be 
projected on the screen. 

A tachistoscopic shutter was mounted over the 
front lens of the projector so that the amount of 
time the advertisements were observed on the 
screen could be accurately controlled. In this 
study each possible pair of the ten advertise- 
ments was presented to the subjects for .5 sec- 
onds. Approximately 20 minutes were required 
for the complete set of 45 presentations. 

Immediately following the presentation of each 
pair the subjects were asked to indicate on a pre- 
pared answer sheet which advertisement of the 
two they would look at if they were given a sec- 
ond look. The preferences of each subject for 
each advertisement were then determined from 
the answer sheets. 

Eye Movement Photography. From the sub- 
jects participating in the tachistoscopic presenta- 
tion, 36 were randomly selected to return for a 
second experiment in which the relative attention- 
drawing power of the ten advertisements was 
measured by use of the Purdue Eye Camera (2). 
This camera consists of a table stand on which 
two actual advertisements are placed, a_half- 
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Table 1 
Mean Number of Preferences for Random Halves of 
the Subjects Using a Group Tachisto- 
scopic Presentation 


Mean Number of Preferences for 
Halves of Subjects 


Advertise 
ment 


Random Random 


Half A Half B 


7.10 7.09 

54 6.04 

5.70 5.22 

5.26 5.43 

; i 4.23 

6 3.92 3.93 

7 ; 3.78 

8 7 3.17 

9 oa 3.05 

10 5 2.45 


silvered mirror placed directly in front of the 
stand, and an eight millimeter motion picture 
camera mounted in front of and above the mirror. 
A motor drive is used to keep the camera speed 
constant at 2.7 frames per second. The subject 
is seated in front of the stand and is able to 
view the advertisements through the half silvered 
mirror. The reflection on the mirror of the up- 
per part of the subject’s face is photographed by 
the motion picture camera.' 

It is possible, by projecting the produced film 
a single frame at a time, to identify which ad- 
vertisement the subject is looking at in each 
frame. A 35 millimeter strip film projector 
which had been converted for use with eight 
millimeter film was available for this single frame 
projection. A count of the number of frames 
during which the subject’s eyes were fixed on a 
particular advertisement gives a measure of the 
amount of time in which the subject was looking 
at that advertisement. This amount of time 
spent on an advertisement was used as one meas- 
ure of attention-drawing power. Another meas- 
ure of attention-drawing power, the total first 
fixations on a particular advertisement, was also 
used. This was a measure of the number of 
times a particular advertisement was looked at 
first by the subjects during the paired presenta- 
tion. 

If each subject was to view all possible pairs 
of ten advertisements, it would be necessary to 
present 45 pairs and each advertisement would 
be presented nine times. In the eye camera ex- 
periment, however, it was felt that each subject 
should view each advertisement only once. In 
order to accomplish this only five pairs were pre- 


1A more detailed description of the camera can be 
found in an unpublished thesis by Karslake (3), and 
in an article by the same author (2). 


sented to each subject. In this manner nine sub- 
jects were needed to complete each total pairing. 
The 36 subjects used in the eye camera experi- 
ment actually represented four complete pairings 
of the ten advertisements. 


Results 


Reliability. From the results of the ta- 
chistoscopic presentation the preferences of 
all 154 subjects for each advertisement were 
totaled. The subjects were then randomly 
split into two groups and the product mo- 
ment correlation (1) between the total pref- 
erences of these groups was computed and 
used as a measure of the reliability of the 
tachistoscopic method. The split-half cor- 
relation found was .98. When the Spearman- 
Brown formula (1) was applied to find an 
estimate of the expected correlation for dou- 
ble the number of judges, an r of .99 was 
found. In Table 1 the mean preferences for 
each half of the subjects are shown. 

In order to investigate the reliability of the 
first eye camera measure, the first looks at 
each advertisement by random halves of the 
subjects were totaled. The r+ between these 
halves was .58. The Spearman-Brown for- 
mula established the r for double the num- 
ber of judges to be .73. The mean number of 
first looks for random halves of the subjects 
are shown in Table 2. In the second eye 
camera method the first ten frames of film 
showing the subject looking at a pair of 


Table 2 
Mean Number of First Looks for Random Halves of 
the Subjects Using the Purdue Eye Camera 


Mean Number of First Looks for 
Halves of Subjects 


Advertise 
ment 


Random 
Half A 
6.50 
6.00 
5.00 
6.50 
2.00 
3.50 
3.50 
4.50 
3.50 
4.00 


Random 
Half B 
5.50 
8.00 
6.00 
4.00 
3.50 
250 
4.00 
2.50 
4.00 
5.00 
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advertisements were considered. The num- 
ber of frames in which the subject looked at 
a particular advertisement was first deter- 
mined, and from this the total number of 
frames in which random halves of the sub- 
jects looked at each advertisement was to- 
taled. Since the eight millimeter camera was 
motor driven and its speed constant at 2.7 
frames per second, it is possible to convert 
the total frames measure into total seconds 
looked at a particular advertisement. In 
Table 3 the mean number of seconds spent 
by random halves of the subjects on each ad- 
vertisement is shown. The r between the two 
halves for this method was found to be .5O. 
The Spearman-Brown estimate of reliability 
was .67. 

Comparison of the Different Methods. \n 
order to determine the relationship between 
the different methods of measuring the atten- 
tion-drawing power of advertisements, prod- 
uct moment correlations were computed be- 
tween the relative attention values found by 
the tachistoscopic presentation method and 
each of the two eye camera measures. In 
Table 4 the mean number of tachistoscopic 
preferences, first looks, and seconds spent on 
each advertisement are shown for all sub- 
jects. The correlation between the results of 
the tachistoscopic and eye camera; first look, 
methods was found to be .79. A correlation 
of .83 was found between the results of the 


Table 3 


Mean Number of Seconds Spent by Random Halves of 
the Subjects Using the Purdue Eye Camera 


Mean Time in Seconds Spent by 
Halves of Subjects 


Random Random 
Half A Half B 
21.3 18.7 
21.8 18.3 
18.9 17.8 
156 21.3 
13.1 13.0 
15.6 17.4 
15.4 13.7 
13.3 15.0 
16.9 15.6 
15.4 15.9 


Advertise 
ment 


Joseph Tiffin and Darvin M. Winick 


Table 4 


Mean Number of Tachistoscopic Preferences, First 
Looks, and Seconds Spent for All Subjects 


Mean 
Number 
of Seconds 
Spent 


Mean 
Number 
of First 

Looks 


6.00 


Mean 
Number 
of Tach. 

Preferences 
7.10 
6.59 
5.46 

5.34 

4.03 

3.93 

3.72 

3.19 

3.09 

10 2.55 


\dvertise- 
ment 
20.0 
20.1 
18.3 
18.1 
13.0 
16.5 
14.5 
14.1 
16.2 
15.6 
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tachistoscopic preferences and the number- 
of-seconds-spent measure of the eye camera 
presentation. 

Since the reliability of the two eye camera 
measures was lower than the reliability of the 
tachistoscopic measure, probably due pri- 
marily to the small number of complete pair- 
ings, it would be of interest to know what 
the correlations between the tachistoscopic 
results and each of the two eye camera meas- 
ures would be if the latter were perfectly re- 
liabie. These correlations can be estimated 
by correcting for attenuation due to the im- 
perfect reliability of the eye camera (cri- 
terion) measures (1, p. 530). These cor- 
rected correlations were .86 between the ta- 
chistoscopic measures and the eye camera, 
first look, measures; and .99* between the 
tachistoscopic results and the eye camera 
“number of seconds spent” measures. 


Summary and Conclusions 


In this investigation it was found that the 
attention-drawing power of advertisements 
can be scaled by the paired-comparison ta- 
chistoscopic method with a reliability of .99. 
Correlations of .86 and .99 were found be- 
tween the tachistoscopic method and two eye 
camera measures. These r’s were the result 
of correcting the obtained correlations for the 
unreliability of the eye camera measures. 

* Actual arithmetic results in an r of 1.03. For a 


discussion of r’s greater than unity, see McNemar 
(4, p. 136). 





Attention-Drawing Power of Magazine Advertisements 


The relationships indicate that the group ta- 
chistoscopic method as used in this study will 
scale advertisements in essentially the same 
order as eye camera methods when attention- 
drawing power is considered. This fact is 
important to people interested in advertising 
research for several reasons: 

1. The tachistoscopic method lends itself 
easily to group presentation and enables large 
numbers of subjects to be reached. 

2. A standard, easily transportable, slide 
projector is the only equipment needed to 
make the tachistoscopic presentation. 

3. Preferences on prepared answer sheets 
may be quickly totaled by hand or machine 
methods. 

In situations where eye movement photog- 
raphy could be used to measure the attention- 
drawing power of advertisements, the results 
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of this study indicate that a considerable 
saving of time and energy can be effected by 
use of a group tachistoscopic presentation. 


Received June 19, 1953. 
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Applied Psychology in Action 


Legal Status of Advertising and Marketing Psychology Experts 


An important U. S. District Court decision 
by Judge Robert C. Bell, District of Minne- 
sota, was handed down on September 4, 1953 
at St. Paul, Minn. The Court admitted the 
testimony of two experts in the field of ad- 
vertising and marketing psychology. These 
experts had been engaged by the U. S. Food 
and Drug Administration to interpret adver- 
tising copy and to determine its impact on a 
sample of 200 prospective purchasers of a 
drug. As a result of the success attained it 
is probable that advertising and marketing 
psychologists will be increasingly used in 
prosecutions involving the fraudulent and 
misleading use of labels and advertisements 
in the marketing of drugs and foods. 

The case revolved around a full page news- 
paper advertisement of “Tryptacin.” The 
label on the drug itself did not contain direc- 
tions for use in the treatment of stomach 
ulcers although this is a condition for which 
the drug is intended and for which the drug 
is suggested and recommended in its adver- 
tising. 

The defense contended that the advertise- 
ment represented only that “Tryptacin” is 
intended for use as an antacid or a palliative 
for acid pain. The defense evidence consisted 
of the testimony of two representatives of a 
firm which handles “Tryptacin” advertising 
and the testimony of a number of physicians. 
The two advertising men testified that, in 
their opinion, the advertisement offered 
“Tryptacin” as a means of relieving acid 
pain and not of curing stomach ulcers. They 
also testified that they had shown the adver- 
tisement to a number of their associates in 
the advertising business, to newspaper censor- 
ship boards, and to other persons and not 
a single person received the impression that 
the advertisement offered a cure for stomach 
ulcers. The physicians who testified for the 
defense stated that they had discussed the 
advertisement with doctors, nurses, patients, 
and other persons and again no one got the 
idea that the product would cure stomach 
ulcers. The Court noted that these witnesses 


did not offer any written evidence concerning 
their interviews. Furthermore, it did not ap- 
pear to the Court that the interviews were 
systematically conducted. The Court went 
on to state: “The likelihood of error or 
prejudice developing in the course of such 
interviews would seem to be great, particu- 
larly since none of the witnesses of claimant, 
including both advertising men and doctors, 
were qualified by education or experience in 
the taking of formal public opinion surveys.” 
(Italics added) 

Judge Bell based his decision on: (1) 
reading and examining the advertisement; 
(2) hearing the testimony of two experts in 
the field of advertising and marketing psy- 
chology (Howard P. Longstaff of the Uni- 
versity of Minnesota and James N. Mosel 
of George Washington University); (3) the 
testimony of two persons who purchased the 
drug in the belief that the advertisement of- 
fered a cure for stomach ulcers; and (4) the 
testimony of a specialist in internal medicine 
who has treated many cases of stomach ulcers 
and who testified that in his opinion the ulcer 
patient would get the impression from the 
advertisement that the drug was offered as a 
cure for stomach ulcers. 

The Court commented on the testimony of 
Longstaff and Mosel as follows: “they pre- 
sented exhaustive analyses of the content of 
the advertisement and the effect which it 
was intended to have upon the prospective 
purchaser of the drug. Such testimony is ad- 
missible to determine the meaning of an ad- 
vertisement. Federal Trade Commission v. 
National Health Aids, Inc., 108 F. Supp. 340 
(D. Md.). 

“Moreover, Dr. Mosel introduced evidence 
relative to two hundred individuals whom he 
surveyed concerning the impression which 
they received from the ‘Tryptacin’ advertise- 
ment. A substantial portion of those inter- 
viewed indicated that they received the im- 
pression from the advertisement that “Tryp- 
tacin’ would ‘stop,’ ‘cure’ or otherwise bring 
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about some permanent relief of ulcers. The 
forms filled out by the individuals questioned, 
interview cards, and tabulations made by Dr. 
Mosel of the answers received, were placed 
in evidence.” 

The Court thereupon upheld the seizure of 
363 cases, more or less, of the drug and 
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assessed the costs of the judicial proceedings 
against the defense.—Source: Letter dated 
September 23, 1953 from Division of Regu- 
lator Management of the Food and Drug Ad- 
ministration togetner with enclosures consist- 
ing of Findings of Fact and Conclusions of 
Law and Memorandum Opinion. 


Reporting Employment Test Scores to Supervisors * 


Clifford E. Jurgensen 


Ass’t Vice President—Personnel, Minneapolis Gas Company 


One of the persistent problems in the field 
of Industrial Psychology is that of reporting 
employment test scores to persons untrained 
in the field of tests and measurements. Such 
persons include supervisors, top management, 
and on occasion, perhaps, the applicant him- 
self. It is simple enough to advise that test 
scores should not be given persons untrained 
in test interpretation. In actual practice, 
however, such advice must often be ignored. 

Training in test interpretation can and 
should be given insofar as is possible. How- 
ever, such training cannot possibly reach all 
persons involved. Further, it is unlikely that 
training can be sufficiently intensive and ex- 
tensive to train adequately any of the persons 
involved. Therefore, it is necessary and de- 
sirable to simplify test score interpretation to 
the greatest possible extent. 

The procedure discussed here consists of a 
profile chart on which percentile scores are 
plotted on a linear continuum. The chart, 
shown in Figure 1, is based on normal prob- 
ability tables in which percentile ranks are 
plotted in accordance with z-score units. 
These units effectively overcome the difficulty 
presented by the fact that the difference be- 
tween the 90th and 99th percentiles is not 
equivalent to the difference between the 40th 
and 49th percentiles. 

* This material contains the gist of a part of the 
presentation by Jurgensen in a panel discussion on 


“Philosophy of Testing” before the Minneapolis Vo- 
cational Guidance Association on April 29, 1954. 


Although carefully designed experiments 
with adequate controls are lacking, experience 
indicates that lay people tend automatically 
to make reasonably correct interpretation of 
scores inasmuch as they are likely to inter- 
pret scores on the basis of where the X’s ap- 
pear on the profile. For example, it is not 
uncommon to hear remarks such as “His 
score on mechanical reasoning is about half 
way between his highest and lowest scores.” 
Such remarks are based on profile plotting 
(and therefore z-scores) and do not corre- 
spond to an average percentile rank. 

Although it has been found that lay peo- 
ple typically interpret scores graphically, and 
therefore Jinearly insofar as standard scores 
are concerned, the profile does give two verbal 
interpretations to facilitate communication or 
record purposes. ‘One of these is the well 
known percentile rank which is labeled on the 
profile as “per cent of group having lower 
score.’ The other is a general verbal inter- 
pretation of the score in terms of commonly 
used adjectives. A column headed “Test or 
Measurement” is used to give the type of test 
in functional terms rather than the. name of 
the specific test. A column headed “Basis of 
Comparison” is used to give the norm group 
on which test scores are profiled. 

The profile chart mentioned above is a 
simplification of a similar chart used within 
the Personnel Department with persons 
trained in test interpretation. This original 
chart permitted interpretation on four, rather 
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supervisors, applicants, etc. 
partment. 


than two bases. These are: group descrip- 
tion, stanine, standard score, and percentile 
rank. Instead of a single column labeled 
“Basis of Comparison,’ the original chart 
contained four columns. These consisted of 
raw score, transmuted score (percentile rank, 
standard score, stanine, or other such score), 
norm group, and a fourth column could be 
used for any additional data desired. This 
original, and more complicated, chart con- 
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Test Profile Report Form at the top is non-technical for use with 
The Report Form at the bottom is for technical uses within the Personnel De- 


tains the same advantages as the simplified 
version insofar as interpretation based on lo- 
cation of X. However, although terms such 
as stanine and standard score do not affect 
score interpretation, lay people feel uneasy 
about a chart which they do not fully com- 
prehend. The simplified profile has there- 
fore been found to contain all of the ad- 
vantages without containing the disadvan- 
tages of the original chart. 





Book Reviews 


Marketing and Social Research Division of 
the Psychological Corporation. The meas- 
ured effectiveness of employee publications. 
New York: Association of National Adver- 
tisers, Inc., 1953. Pp. 109. $10.00. 

Here is a well designed study, the results 
of which are reported in a beautiful, litho- 
graphed brochure replete with illustrations, 
simple tables of results expressed as percent- 
ages, a minimum of verbiage, and a maxi- 
mum of white spaced margins. The overall 
page size is 14 inches by 11 inches. Presum- 
ably this is the kind of expensive and ex- 
pensive-looking report that consultants and 
consulting organizations believe will be read 
by top brass in business and industry. It is 
in decided contrast to the form of report 
adopted by scholars in reporting the detailed 
results of quantitative studies in the scientific 
journals and monographs. The very form of 


this report, in this reviewer’s opinion, poses a 
practical psychological problem: do high level 
executives really prefer this type of advertis- 


ing lay-out report? Is the stereotype of the 
“busy executive” correct which assumes that 
what is put before him for his serious atten- 
tion and study must be presented in a form 
so that “he who runs will read”? 

But let’s get on with a review of the con- 
tents of this report. As the sub-title states, 
it is a study of readership, penetration, and 
readability of seven employee publications. 
The sponsor was the ANA Public Relations 
Committee and the study itself was made by 
the Psychological Corporation with Charles 
L. Vaughn serving as technical director. It 
was aimed at finding out what employees will 
read and believe. 

A foreword, an introduction dealing with 
the role of employee publications in the total 
area of business communications, the objec- 
tives of the study, the methods used in the 
investigation, and a general summary of re- 
sults are then followed by a pictorial, tabular, 
and verbal description of the detailed results. 

In general, the results show the employee 
publication to be one of the best of available 
sources of information about the company— 


better than such sources as the first-line su- 
pervisor, the union steward, and meetings. 
An incredible 97 per cent of the 1,800 in- 
plant interviews indicate belief in what they 
read in the company magazine or newspaper. 
Readership was likewise quite high, namely 
90 per cent reported they had read at least 
one of the two most recent issues and, on the 
average, 78 per cent reported that they read 
the publications regularly. Thoroughness of 
readership, however, was much less. 

The industrial psychologist and the ap- 
plied social psychologist will be especially in- 
terested in the reported relationships between 
“leftist” and “rightist” attitudes of em- 
ployees interviewed and the extent of their 
readership and in the Flesch readability 
scores of these seven publications. In regard 
to the latter, as is usual, the publications are 
written at a level of difficulty that is too high 
for over one-third of the rank-and-file em- 
ployees. Of more importance, little relation- 
ship between readership and readability was 
found. This finding, however, may be re- 
garded as throwing doubt on the method of 
measuring readership which was used rather 
than as evidence tending to discredit the im- 
portance of simplified language in reaching 
employees with limited amounts of education. 

The reviewer has little to criticize with re- 
spect to what is actually presented in this 
report. Furthermore, there are many com- 
mendable features such as the frankness of 
the plant-by-plant comparisons, the wealth of 
pictorial illustrations of “good” and “poor” 
features of these company publications, and 
the reproduction, in the appendix, of the in- 
terview schedule used to measure readership 
and attitudes. It is obvious that high-level 
professional competence is reflected. How- 
ever, one misses any reference to other rele- 
vant studies to which the scholarly business 
executives could turn for further informa- 
tion if he so desired. The reviewer suspects 
there are many more studious business ex- 
ecutives than the advertising fraternity re- 
sponsible for the form of the present report 
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would believe possible. Finally, it would 
have been of value to the serious student of 
industrial communications to give the sta- 
tistical constants such as means and stand- 
ard deviations, and coefficients of correlation 
in an appendix so that findings in the present 
study might be compared with reports of 
similar scientific studies. 


Donald G. Paterson 
The University of Minnesota 


Comment on Preceding Review 


Paterson’s remarks in regard to the elabo- 
rateness of the presentation accentuate the 
rather interesting differences in frames of ref- 
erence. After considerable discussion, we de- 
cided to make the publication simple because 
we felt that executives were tiring of the 
glossy four-color jobs! Actually, only the 
general summary and preceding page were 
thought to be of interest to top management, 
and these two pages will probably be printed 
separately for that group. 

We were rather concerned with the low and 
occasionally negative correlations between 
Flesch scores and readership. The explana- 
tion, I believe, is to be found more in the na- 
ture of the articles than in the weaknesses of 
the formula or the measures of readership, 
although neither is perfect by any means. 

What happens, I suspect, is that the timely 
and important material may often be hastily 
written at the last minute, and may come 
from persons not concerned with writing 
readably. “Dravo Bids,” a feature illus- 
trated in our publication, actually tells the 
workers indirectly whether they are going to 
have jobs or not, yet it abounds with long 
.words and big figures. I can verify from in- 
tensive interviewing of my own that even the 
poorly educated “sand hogs” literally pore 
through it. There are other similar features. 

The travesty is that compelling material 
written at a very difficult level may lead the 
inept reader to some rather bizarre conclu- 
sions indeed. 


Charles L. Vaughn 
The Psychological Corporation 


Book Reviews 


Anon. Army personnel tests and measure- 
ments. TM12-260, Department of the 
Army. Washington, D. C.: U. S. Govern- 
ment Printing Office, 1953. Pp. 125. $.55. 

‘ This is a good little summary of the use of 
tests and rating procedures in the Army. It 
reads much like a standard text on employ- 
ment psychology condensed and written down 
to the level of readers without a psychology 
background. For psychologists in the Army 
it might serve as a useful refresher and al- 
most approximates a manual. For other 
Army personnel needing some familiarity with 
the field, it would be very helpful if read 
carefully and, preferably, with an elementary 
statistics text on the side. The monograph 
covers test construction, criteria, scoring of 
tests (especially standard scores), reliability 
and validity, the use of profiles in classifica- 
tion, achievement tests, self-description and 
rating scales (including forced choice), test 
administration and scoring. 

The work has a number of commendable 
features. It is concise and there is not a 
word wasted. Effective use is made of 
graphic materials—some of them quite in- 
genious. There is interesting adaptation of 
military terminology to conventional psycho- 
logical presentation. For instance, reliability 
and validity are interpreted in terms of “cal- 
culated risks.” The treatment is down to 
earth and practical, but entirely scientific 
withal. 

There is always the problem of how to 
handle statistics in a work like this. The 
present authors employ conventional statisti- 
cal terminology, but do not indicate how any- 
thing is computed. There is a frequent sug- 
gestion that “any statistics book” covers 
some particular item. The authors do about 
as well as could be done under the circum- 
stances with brief explanations of some sta- 
tistical notions and graphic materials to 
clarify the explanation. According to an in- 
sert the major responsibility of the work ap- 
pears to have been carried by Baier, Bayroff, 
and Rundquist. They are to be congratu- 
lated on having done an interesting and use- 
ful minor piece of work. 


Harold E. Burtt 
The Ohio State University 





Book Reviews 


Buros, Oscar K., editor. The fourth mental 
measurements yearbook. Highland Park, 
N. J.: The Gryphon Press, 1953. Pp. xxiv 
+ 1163. $18.00. 


Most reviews of earlier editions of the 
Mental Measurements Yearbook have begun 
with accolades. The reviewer of this latest 
edition sees no reason to deviate from this 
course: Buros’ Fourth Mental Measurements 
Yearbook is a monumental work, even longer 
than the previous edition and of inestimable 
value to purveyors and users of information 
about tests. “The (825-page) section “Tests 
and Reviews’ lists 793 tests, 596 original re- 
views by 308 reviewers, 53 excerpts from test 
reviews in 15 journals, and 4,417 references 
on the construction, validation, use, and limi- 
tations of specific tests. . . . The (267-page) 
section ‘Books and Reviews’ lists 429 books 
on measurement and closely related fields and 
758 excerpts from book reviews in 121 jour- 
nals.” The series of detailed indexes remains 


an excellent feature of the volume. 
Projective tests, aptitude test batteries, and 
tests for specific vocations all receive notice- 
ably more attention in the present volume 
than in the Third Yearbook. Some 19 projec- 


tive tests are mentioned for the first time in 
the yearbook series and 631 new journal ref- 
erences on the Rorschach (one-seventh of the 
total number of journal references on tests) 
bring the total in the yearbook series to an 
impressive 1,217. The one page devoted to 
three aptitude test batteries in the Third 
Yearbook has become 37 pages devoted to 
nine such batteries in the current work. That 
the aptitude test battery is a relatively recent 
development is made clear by the post-World 
War II dates on seven of the nine batteries. 
That 45 tests for specific vocations are listed 
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in the current yearbook, as opposed to 10 in 
the Third Yearbook, is partly the result of the 
only recently won permission to review sev- 
eral of these tests, partly a reflection of the 
continued efforts of professional schools to 
improve selection procedures. 

Past reviewers have argued for changes in 
editorial policy, notably for the exclusion of 
tests which do not meet certain predeter- 
mined criteria. The present reviewer chooses 
to concern himself with only one aspect of 
editorial policy: the exclusion of tests thor- 
oughly reviewed in previous yearbooks for 
which there has been no new edition since 
the last yearbook. 

Unless it can be assumed that all yearbook - 
users know they must also consult previous 
editions, they may not become aware of the 
existence of some established tests. At least 
a half dozen of the best known, most used 
(and frequently most carefully studied) tests 
of manual dexterity are not mentioned in the 
Fourth Yearbook, nor is the well-known 
Minnesota Clerical Test. Current yearbooks 
should at least list tests previously reviewed 
with a cross reference to the appropriate vol- 
ume. Exclusion criteria might be developed 
so that such lists would not be cluttered with 
the measurement whims of the century. 

Added features require space, and space 
has always been a problem for Buros. The 
“Books and Reviews” section appears to offer 
less that is new and to serve a more limited 
readership. To the extent that this section is 
a drain on the “Tests and Reviews” section, 
it is here that space economies should be 
effected. 


Charles N. Morris 


Teachers College, Columbia University 





New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Paterson, 
Editor, Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota. 


Problems of consciousness. Harold A. Abram- 
son, Editor. New York: The Josiah Macy, 
Jr. Foundation, 1954. Pp. 177. $3.25. 

Rorschach responses in the aged. Louise 
Bates Ames, Janet Learned, Ruth W. 
Metraux, and Richard N. Walker. New 
York: Paul B. Hoeber, Inc., Medical Book 
Department of Harper & Brothers, 1954. 
Pp. 244. $6.75. 

Psychological testing. Anne Anastasi. 
York: The Macmillan Company, 
Pp. 240. $4.25. 

The exteriorization of the mental body. 
James Baker, Jr. New York: The Wil- 
liam-Frederick Press, 1954. Pp.32. $1.50. 

Psychology of personnel in business and in- 
dustry. Second Edition. Roger M. Bel- 
lows. New York: Prentice-Hall, Inc., 
1954. Pp. 467. $7.35. 

Employment psychology: the interview. 
Roger M. Bellows and M. Frances Estep. 
New York: Rinehart & Company, Inc., 
1954. Pp. 295. $4.25. 

After high school—-what? 
Minneapolis: University of Minnesota 
Press, 1954. Pp. 240. $4.25. 

Columbia mental maturity scale. Bessie B. 
Burgemeister, Lucille Hollander Blum, and 
Irving Lorge. Yonkers-on-Hudson, N. Y.: 
World Book Company, 1954. Examiner’s 
Kit: 100 items, and a comprehensive 
Manual. $35.00. Individual Record 
Blanks are priced at $.85 per package of 
35. 

The sociology of work. 
Minneapolis: University of Minnesota 
Press, 1954. Pp. 330. $5.00. 

Manual of child psychology. Second Edi- 
tion. Leonard Carmichael, Editor. New 
York: John Wiley & Sons, Inc., 1954. Pp. 
1,295. $12.00. 

Sociology perspective. Ely Chinoy. 
York: Doubleday and Company, 
1954. Pp. 58. $.85. 

Introduction to logic. Irving M. Copi. 
York: The Macmillan Company, 
Pp. 472. $4.00. 


New 
1954. 


Ralph F. Berdie. 


Theodore Caplow. 


New 
Inc., 


New 
1953. 


Symbolic logic. Irving M. Copi. New York: 
The Macmillan Company, 1953. Pp. 472. 
$5.00. 

Religion and human behavior. 
ger, Editor. New York: Association Press, 
1954. Pp. 233. $3.00. 

Production guides and controls for the mod- 
ern executive. M. J. Dooher, Editor. 
New York: American Management Asso- 
ciation, 1953. Pp. 52. $1.25. 

Stepping up office efficiency. M. J. Dooher, 
Editor. New York: American Manage- 
ment Association, 1953. Pp. 46. $1.25. 

Streamlining office equipment and service. 
M. J. Dooher, Editor. New York: Ameri- 
can Management Association, 1953. Pp. 
$5. $1.25. 

Gearing up for better production. 
Dooher, Editor. New York: 
Management Association, 1953. 
$1.25. 

The human side of the office manager’s job. 
M. J. Dooher, Editor. New York: Ameri- 
can Management Association, 1953. Pp. 
40. $1.25. 

A critical look at the insurance buyer's role. 
M. J. Dooher, Editor. New York: Ameri- 
can Management Association, 1953. Pp. 
35. $1.25. 

Maintaining a dynamic insurance program. 
M. J. Dooher, Editor. New York: Ameri- 
can Management Association, 1953. Pp. 
44. $1.25. 

Industry at the bargaining table. 
Dooher, Editor. New York: 
Management Association, 1954. 
$1.25. 

Selling costs and market potential: controls 
and guides. M. J. Dooher, Editor. New 
York: American Management Association, 
1954. Pp. 38. $1.25. 

Modern learning theory. William K. Estes, 
Sigmund Koch, Kenneth MacCorquodale, 
Paul E. Meehl, Conrad G. Mueller, Wil- 
liam N. Schoenfeld, and William S. Ver- 
planck. New York: Appleton-Century- 
Crofts, Inc., 1954. Pp. 424. 


Simon Doni- 


ms 3 
American 
Pp. 58. 


M. J. 
American 
Pp. 51. 
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Mind and performance. Harold Kenneth 
Fink. New York: Vantage Press, 1954. 
Pp. 113. $3.00. 

Human behavior in industry. William W. 
Finlay, A. Q. Sartain, and Willis M. Tate. 
New York: McGraw-Hill Book Company, 
Inc., 1954. Pp. 247. $4.00. 

A psychological glossary. D. C. Fraser. 
Cambridge, England: W. Heffer & Sons, 
Ltd., 1954. Pp. 40. 3s. 6d. net. 

Methods of research. Carter V. Good and 
Douglas E. Scates. New York: Appleton- 
Century-Crofts, Inc., 1954. Pp. 896. $5.50. 

The life and ideas of the Marquis De Sade. 
Geoffrey Gorer. New York: The British 
Book Centre, Inc., 1954. Pp. 244. $3.50. 

Child psychology. Fourth Edition. Arthur 
T. Jersild. New York: Prentice-Hall, Inc., 
1954. Pp. 676. $6.00. 

The practice of psychotherapy. C. G. Jung. 
New York: Bollingen Series, 140 East 62nd 
Street, 1954. Pp. 377. $4.50. 

Know your reader. George R. Klare and 
Byron Buck. New York: Hermitage 
House, Inc., 1954. Pp. 192. $2.95. 

The technique of handling people. Revised 
Edition. Donald A. and Eleanor C. Laird. 


New York: McGraw-Hill Book Company, 


Inc., 1954. Pp. 189. $3.75. 

Towards an understanding of juvenile delin- 
quency. Bernard Lander. New York: 
Columbia University Press, 1954. Pp. 143. 
$3.00. 

Your child and his art. 
New York: Macmillan Company, 
Pp. 186. $6.50. 

Break down the walls. John Bartlow Mar- 
tin. New York: Ballantine Books, 1954. 
Pp. 310. Paperbound edition $.50. Hard- 
bound edition $3.50. 

A new approach to office management: inte- 
grated data processing through common 
language machines. Elizabeth Marting, 
Editor. New York: American Manage- 
ment Association, 1954. Pp. 62. $2.50. 

People’s Padre. Emmett McLoughlin. Bos- 
ton: Beacon P ss, 1954. Pp. 288. $3.95. 

How to enjoy yourself. Albert A. Ostrow. 
New York: E. P. Dutton & Co., Inc., 1954. 
Pp. 259. $2.95. 


Viktor Lowenfeld. 
1954. 
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Psychology. William J. Pitt and Jacob A. 
Goldberg. New York: McGraw-Hill Book 
Company, Inc., 1954. Pp. 414. $4.50. 

Psychology and life. Fourth Edition. Floyd 
L. Ruch. New York: Scott Foresman and 
Company, 1954. Pp. 496. $5.00. 

Letters to my daughter. Dagobert D. Runes. 
New York: Philosophical Library, 1954. 
Pp. 131. $2.50. 

Principles of industrial psychology. Thomas 
Arthur Ryan and Patricia Cain Smith. 
New York: Ronald Press Company, 1954. 
Pp. 534. $5.50. 

Selected writings of De Sade. Leonard de 
Saint-Yves. New York: The British Book 
Centre, Inc., 1954. Pp. 306. $6.75. 

Case studies in management development: 
theory and practice in ten selected com- 
panies. Robert G. Simpson. New York: 
American Management Association, 1953. 
Pp. 140. $2.50. 

A survey of management development: the 
quantitative aspects. Joseph M. Trickett. 
New York: American Management Asso- 
ciation, 1953. Pp. 64. $1.25. 

Management education in American business. 
Lyndall F. Urwick. New York: American 
Management Association, 1953. Pp. 136. 
$1.50. 

An annotated bibliography of word associa- 
tion references important to marketing re- 
searchers. James M. Vicary. New York: 
James M. Vicary Company, 20 East 60th 
Street. Pp. 5. Gratis. 

The education of employees: a status report. 
Douglas Williams and Stanley Peterfreund. 
New York: American Management Asso- 
ciation, 1953. Pp. 65. $1.25. 

Personality through perception: an experi- 
mental and clinical study. H. A. Witkin, 
H. B. Lewis, M. Hertzman, K. Machover, 
P. Bretnall Meissner, and S. Wapner. New 
York: Harper & Brothers, 1954. Pp. 571. 
$7.50. 

Audio-visual materials: their nature and use. 
Walter Arno Wittich and Charles F. Schul- 
ler. New York: Harper & Brothers, 1953. 
Pp. 564. $6.00. 

Psychology in the nursery school. 
Wolffheim. 
brary, 1953. 


Nelly 
New York: Philosophical Li- 
Pp. 144. $3.75. 
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Journal of counseling psychology. C. Gilbert 
Wrenn, Editor. Business Office: Room 2, 
Old Armory, Ohio State University, Co- 
lumbus 10, Ohio. $6.00 per year. $1.75 
per issue. Issued bi-monthly. 

Reading rapidly and well. Revised Edition. 
C. Gilbert Wrenn and Luella Cole. Stan- 
ford, Calif.: Stanford University Press, 
1954. Pp. 16. $.15. 

The language of dynamic psychology. Jo- 
seph W. Wulfeck and Edward M. Bennett. 
New York: McGraw-Hill Book Company, 
Inc., 1954. Pp. 111. $4.00. 

Administration and the teacher. William A. 
Yeager. New York: Harper & Brothers, 
1954. Pp. 577. $4.50. 

The pre-adolescent exceptional child. Child 
Research Clinic of the Woods Schools. 
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Langhorne, Pa.: The Child Research Clinic 
of the Woods Schools, 1953. Pp. 70. 
Gratis. 

This we believe about education. Educa- 
tional Advisory Committee and Council. 
New York: National Association of Manu- 
facturers. Pp. 32. 

Studies in schizophrenia. Tulane Depart- 
ment of Psychiatry and Neurology. Cam- 
bridge, Mass.: Published for the Common- 
wealth Fund by the Harvard University 
Press, 1954. Pp. 619. $8.50. 

Statistics of public secondary day schools, 
1951-1952. U.S. Department of Health, 
Education, and Welfare. Washington 25, 
D. C.: Superintendent of Documents, U. S. 
Government Printing Office, 1954. Pp. 81. 
$.35. 
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